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METHODS OF DIAGNOSIS AND TREATMENT OF ANDROGEN-DEPENDENT 
PROSTATE CANCER, PROSTATE CANCER UNDERGOING ANDROGEN 
WITHDRAWAL, AND ANDROGEN-INDEPENDENT PROSTATE CANCER 

CROSS-REFERENCES TO RELATED APPLICATIONS 
This application claims priority from the following applications: USSN 60/295,917, 
filed June 4, 2001, USSN 60/368,689, filed March 29, 2002; USSN 60/350,666, filed 
November 13, 2001; and USSN 60/372,246, filed April 12, 2002; each of which is 
incorporated herein by reference in its entirety. 

FIELD OF THE INVENTION 
. The invention relates to the identification of nucleic acid and protein expression 
profiles and nucleic acids, products, and antibodies thereto that are involved in prostate 
cancer; and to the use of such expression profiles and compositions in the diagnosis, 
prognosis, and therapy of prostate cancer. The invention further relates to methods for 
identifying and using agents and/or targets that inhibit prostate cancer. 

BACKGROU'ND OF THE INVENTION 
Prostate cancer is the most frequently diagnosed cancer and the second leading cause 
of male cancer death m North America and northern Europe. Early detection of prostate 
cancer using a serum test for prostate-specific antigen (PSA) has dramatically improved tlie 
freatment of the disease (Oesterling (1992) J. Am. Med. Assoc. 267:2236-2238). Treatment 
of prostate cancer consists largely of surgical prostatectomy, radiation therapy, androgen 
ablation therapy and chemotherapy. Although many prostate cancer patients are effectively 
treated, the current therapies can all induce serious side effects which duninish quality of life. 
Patients who present with metastatic disease are most often treated with androgen-ablation 
therapy. Hormone blockade results in significant regression of the tumor. However, this 
treatment rarely cures the patient and' invariably results in progression to androgen- 
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independent disease, which is incurable. Afiin and Stuart (1994) J.S.C. Med. Assoc. 90:231- 
236. 

The identification of novel therapeutic targets and diagnostic markers is essential for 
improving the current treatment of prostate cancer patients. Recent advances in molecular 
5 medicine have increased the interest in tumor-specific cell surface antigens that could serve 
as targets for various immmiotherapeutic or small molecule strategies. Antigens suitable for 
immunotherapeutic strategies should be highly expressed in cancer tissues and ideally not 
expressed in noimal adult tissues. Expression in tissues that are dispensable for life, 
however, may be tolerated. Examples of such antigens include Her2/neu and the B-cell 

10 antigen CD20. Hiunanized monoclonal antibodies directed to Her2/neu (Herceptin) are 

currently in use for the treatment of metastatic breast cancer. Ross and Fletcher (1998) Stem 
Cells 16:413-428. Similarly, anti-CD20 monoclonal antibodies (Rituxin) are used to 
effectively treat non-Hodgkin's lymphoma. Maloney, et al. (1997) Blood 90:2188-2195; 
Leget and Czuczman (1998) Cuir. Opin. Oncol. 10:548-551. 

15 Several potential immunotherapeutic targets have been identified for prostate cancer. 

They include prostate-specific membrane antigen (PSMA) (IsraeU, et al. (1993) Cancer Res. 
53:227-230), prostate stem cell antigen (PSCA; Reiter, et al. (1998) Proc. Natl. Acad. Sci. 
USA 95:1735-1740), and serpentine transmembrane epithelial antigen of the prostate 
(STEAP; Hubert, et al. (1999) Proc. Natl . Acad . Sci. USA 96:14529-14534). PSMA is a type 

20 II transmembrane hydrolase with significant homology to a rat neuropeptidase (Carter, et al. 
(1996) Proc. Natl. Acad. Sci. USA 93:749-753). Antibodies directed towards PSMA aie 
currently being used to detect metastasized prostate cancer as the Prostascint Scan (Sodee, et 
al. (1996) Clin. Nucl. Med. 21:759-767) and are also being evaluated for treatment of 
advanced disease (Gregorakis, et al. (1998) Semin. Uiol. Oncol. 16:2-12; Liu, et al. (1998) 

25 Cancer Res. 58:4055-4060; Murphy, et al. (1998) J. UidI. 1 60:2396-2401). In a study on 
bone metastasis of prostate cancer, only 8 out of 1 8 patient samples expressed PSMA (Silver, 
et al. (1997) Clm. Cancer Res. 3:81-85). Therefore, it is clear that other targets need to be 
identified to manage metastasized disease. PSCA is a member of the Thy-l/Ly-6 family of 
glycosylphosphatidylinositol-linked plasma membrane proteins (Reiter, et al. (1998) Proc. 

30 Natl. Acad. Sci. USA 95: 1735-1740). Lranunohistochemical data shows that PSCA is up- 
regulated in the majority of prostate cancer epithelia and is also detected in bone metastasis 
(Gu, et al. (2000) Oncogene 19:1288-1296). Recent work shows that antibodies dkected to 
• 2 
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PSCA can prevent metastatic spread of prostate cancer in a mouse model (Saffran, et al. 
(2001) Proc. Natl. Acad. Sci. USA 98:2658-2663). STEAP is a multi-transmembrane 
prostate-specific protein that may function as a channel or transporter protein (Hubert, et al. 
(1999) Proc. Natl. Acad. Sci. USA 96: 14529-14534). Its protein expression is specific to the 
5 basolateral membranes of normal prostate and prostate cancer epithelia. STEAP expression 
was most highly concentrated at cell-cell boundaries, implying a potential ftinction in 
intercellular commimication. Therapeutic monoclonal antibodies have so far not been 
reported for STEAP. 



SUMMARY OF THE INVENTION 

The present invention therefore provides nucleotide sequences of genes that are up- 
and down-regulated in androgen-indepetidetit prostate cancer cells or prostate cells 
undergoing androgen withdrawal. Such genes are useful for diagnostic purposes, and also as 
targets for screening for therapeutic compounds that modulate prostate cancer, such as 
hormones or antibodies. Other aspects of the invention wiU become apparent to the skilled 
artisan by the following description of the invention. 

In one aspect, the present invention provides a method of detecting an androgen 
independent prostate cancer-associated transcript in a cell from a patient, the method 
comprising contacting a biological sample from the patient with a polynucleotide that 
selectively hybridizes to nucleic acid molecule comprising a sequence at least 80% identical 
to a sequence as shown in Tables 1 A-4. 

In one embodiment, the present invention provides a method of determining the level 
■ of a prostate cancer associated transcript in a cell from a patient. 

hi one embodiment, the present invention provides a method of detecting a prostate 
cancer-associated transcript in a cell from a patient, the method comprising contacting a 
biological sample from the patient with a polynucleotide that selectively hybridizes to a 
sequence at least 80% identical to a sequence as shown in Tables 1 A-4. 

hi various embodiments, the polynucleotide selectively hybridizes to a sequence at 
least 95% identical to a sequence as shown in Tables 1 A-4; the polynucleotide comprises a 
sequence as shown in Tables 1 A-4; the biological sample is a tissue sample; the biological 
sample comprises isolated nucleic acids, e.g., inRNA; the polynucleotide is labeled, e.g., with 
a fluorescent label; the polynucleotide is immobiUzed on a solid surface; the patient is 
3 
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undergoing a therapeutic regimen to treat prostate cancer; the patient is suspected of having 
metastatic prostate cancer; the patient is a human; the patient is suspected of having a taxol- 
resistant cancer; or the prostate cancer associated transcript is mRNA. 

In other embodiments, the method further comprises the step of amphfying nucleic 
5 acids before the step of contacting the biological sample with the polynucleotide. 

In anotlier aspect, the present invention provides a method of monitoring the efficacy 
of a therapeutic treatment of prostate cancer, the method comprisitig the steps of: (i) 
providing a biological sample firom a patient undergoing the ther^eutic treatment; and (ii) 
determining the level of a prostate cancer-associated transcript in the biological sample by 
1 0 contacting the biological sample with a polynucleotide that selectively hybridizes to a 

sequence at least 80% identical to a sequence as shown in Tables lA-4, thereby monitoring 
the efficacy of the therapy. In a further embodiment, the patient has metastatic prostate 
cancer. In a further embodiment, the patient has a drug resistant (e.g., taxol resistant) form of 
prostate cancer. 

15 In one embodiment, the method fiirfher comprises the step of: (iii) comparing the 

level of the prostate cancer-associated transcript to a level of the prostate cancer-associated 
transcript in a biological sample from the patient prior to, or earlier in, tlie therapeutic 
treatment. 

Additionally, provided herein is a method of evaluating the effect of a candidate 
20 prostate cancer drug comprising administering the drug to a patient and removing a cell 

sample fi:om the patient. The expression profile of the cell is then determined. This method 
may fiuther comprise comparing the expression profile to an expression profile of a healthy 
individual. In a preferred embodiment, said expression profile includes a gene of Tables 1 A- 
4. 

25 In one aspect, the present invention provides an isolated nucleic acid molecule 

consisting of a polynucleotide sequence as shown in Tables 1 A-4. 

In one embodiment, an expression vector or cell comprises the isolated nucleic acid. 
In one aspect, the present invention provides an isolated polypeptide which is encoded 
by a nucleic acid molecule having polynucleotide sequence as shown in Tables 1 A-4. 
30 In another aspect, the present invention provides an antibody that specifically binds to 

an isolated polypeptide which is encoded by a nucleic acid molecule having polynucleotide 
sequence as shown in Tables 1 A-4. 
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In certain embodiments, the antibody is conjugated to an effector component, e.g., a 
fluorescent label, a radioisotope or a cytotoxic chemical; the antibody is an antibody 
fragment; or the antibody is humanized. 

In one aspect, the present invention provides a method of detecting a prostate cancer 
5 cell in a biological sample from a patient, the method comprising contacting the biological 
sample with an antibody as described herein. 

In another aspect, tiie present invention provides a method of detecting antibodies 
specific to prostate cancer in a patient, the method comprising contacting a biological sample 
from the patient with a polypeptide encoded by a nucleic acid comprising a sequence from 
10 Tables lA-4. 

In another aspect, the present invention provides a method for identifying a compound 
that modulates a prostate cancer-associated polypeptide, the method comprising the steps of: 
a) contacting the compound with a prostate cancer-associated polypeptide, the polypeptide 
encoded by a polynucleotide that selectively hybridizes to a sequence at least 80% identical 
15 to a sequence as shown in Tables 1 A-4; and b) determining the fimctional effect of the 
compound upon the polypeptide. 

In one embodiment, the functional effect is a physical effect, an enzymatic effect, or a 
chemical effect. 

In one embodiment, the polypeptide is expressed in a eukaryotic host cell or cell 
20 membrane. In anoflier embodiment, the polypeptide is recombinant. 

In one embodiment, the fimctional effect is determined by measuring ligand binding 
to the polypeptide. 

In another aspect, the present invention provides a method of inhibiting proliferation 
of a prostate cancer-associated cell 'to treat prostate cancer in a patient, the method 
25 comprising the step of administering to the subject a therapeutically effective amount of a 
compound identified as described herein. 

In one embodiment, the compound is an antibody. 

In another aspect, the present invention provides a drug screening assay comprising 
the steps of: a) administering a test compound to a mammal having prostate cancer or to a 
30 cell sample isolated therefrom; b) comparing the level of gene expression of a polynucleotide 
that selectively hybridizes to a sequence at least 80% identical to a sequence as shown in 
Tables 1 A-4 in a treated cell or mammal with the level of gene expression of the 
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polynucleotide in a control cell sample or mammal, wherein a test compound that modulates 
the level of expression of the polynucleotide is a candidate for the treatment of prostate 
cancer. 

hi one embodiment, the control is a mammal with prostate cancer or a cell sample 
5 therefrom that has not been treated with the test compound. In another embodiment, the 
control is a normal cell or manmLal. 

hi one embodiment, the test compound is administered in varying amounts or 
concentrations, hi another embodiment, the test compound is administered for varying time 
periods. In another embodiment, the compmson can occur after addition or removal of the 
10 drug candidate. 

In one embodiment, the levels of a plurality of polynucleotides that selectively 
hybridize to a sequence at least 80% identical to a sequence as shown in Tables 1 A-4 are 
individually compared to their respective levels in a control cell sample or mammal. In a 
preferred embodiment the plurality of polynucleotides is firom three to ten. 
15 In another aspect, the present invention provides a method for treating a mammal 

having prostate cancer comprising administering a compound identified by the assay 
described herein. 

hi another aspect, the present invention provides a pharmaceutical composition for 
treating a mammal having prostate cancer, the composition comprising a compound 
20 identified by the assay described herein and a physiologically acceptable excipient. 

hi one aspect, the present invention provides a metliod of screening drug candidates 
by providing a cell expressmg a gene that is up- and down-regulated as in a prostate cancer. 
In one embodiment, a gene is selected from Tables 1 A-4. The method further includes 
adding a drug candidate to the cell and determining the effect of the drug candidate on the 
25 expression of the expression profile gene. 

In one embodmient, the method of screening drug candidates includes comparing the 
level of expression in the absence of the drug candidate to the level of expression in the 
presence of the drug candidate, wherein the concentration of the drug candidate can vary 
when present, and wherein the comparison can occur after addition or removal of the drug 
30 candidate. In a preferred embodiment, the cell expresses at least two expression profile 
genes. The profile genes may show an increase or decrease. 



6 
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Also provided is a method of evaluating the effect of a candidate prostate cancer drug 
comprising administering the drug to a transgenic animal expressing or over-expressing the 
prostate cancer modulatory protein, or an animal lacking the prostate cancer modulatory 
protein, for example as a result of a gene knockout. 
5 Moreover, provided herein is a biochip comprising one or more nucleic acid segments 

of Tables 1 A-4, wherein the biochip comprises fewer than 1000 nucleic acid probes. 
Preferably, at least two nucleic acid segments are included. More preferably, at least three 
nucleic acid segments are included. 

Furthermore, a method of diagnosing a disorder associated with prostate cancer is 

10 provided. The method comprises determiiiing the expression of a gene of Tables lA-4, in a 
first tissue type of a first individual, and comparing the distribution to the expression of the 
gene fi:om a second normal tissue type fi:om the first individual or a second unaffected 
individual. A difference in the expression indicates that the first individual has a disorder 
associated with prostate cancer. 

15 In a further embodiment, the biochip also includes a polynucleotide sequence of a 

gene that is not up- and down-regulated in prostate cancer. 

In one embodiment a method for screening for a bioactive agent capable of interfering 
with the binding of a prostate cancer modulating protein (prostate cancer modulatory protein) 
or a fragment thereof and an antibody which binds to said prostate cancer modulatory protein 

20 or fi-agment thereof. In a preferred embodiment, the method comprises combining a prostate 
cancer modulatory protein or fi:agment thereof, a candidate bioactive agent and an antibody 
which binds to said prostate cancer modulatory proteui or firagment thereof. The method 
further includes determining the binding of said prostate cancer modulatory protein or 
fi-agment thereof and said antibody. Wherein there is a change in binding, an agent is 

25 identified as an interfering agent. The interfering agent can be an agonist or an antagonist. 
Preferably, the agent inhibits prostate cancer. 

Also provided herein are methods of eUdting an immune response in an individual. 
In one embodiment a method provided herein comprises administering to an individual a 
composition comprising a prostate cancer modulating protein, or a firagment thereof. In 

3 0 another embodiment, the protein is encoded by a nucleic acid selected fix)m those of Tables 
lA-4. 
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Further provided herein are compositions capable of eliciting an immune response in 
an individual. In one embodiment, a composition provided herein comprises a prostate 
cancer modulating protein, preferably encoded by a nucleic acid of Tables lA-4, or a 
fragment thereof, and a pharmaceutically acceptable carrier. In another embodiment, said 
5 composition comprises a nucleic acid comprising a sequence encoding a prostate cancer 
modulating protein, preferably selected from the nucleic acids of Tables 1 A-4and a 
pharmaceutically acceptable carrier. 

Also provided are methods of neutralizing the effect of a prostate cancer protein, or a 
fragment thereof, comprising contacting an agent specific for said protein with said protein in 

10 an amount sufficient to effect neutralization. In another embodiment, the protein is encoded 
by a nucleic acid selected firom those of Tables 1 A-4. In another aspect of the invention, a 
mefliod of treating an individual for prostate cancer is provided. In one embodiment, the 
metliod comprises administering to said individual an inhibitor of a prostate cancer 
modulating protein. In another embodiment, the method comprises administering to a patient 

15 having prostate cancer an antibody to a prostate cancer modulating protein conjugated to a 
therapeutic moiety. Such a therapeutic moiety can be a cytotoxic agent or a radioisotope. 

DETAILED DESCRIPTION OF THE ES VENTION 
In accordance with the objects outlined above, the present invention provides novel 

20 methods for diagnosis and evaluation of androgen-dependent prostate cells (malignant or 
non-malignant), prostate cells undergoing androgen withdrawal, and androgen-independent 
prostate cancer, as well as methods for treating androgen-dependent prostate cells (mahgnant 
or non-maHgnant), prostate cancer undergoing androgen withdrawal, and androgen- 
independent prostate cancer. The current Specification incorporates the text of USSN 

25 09/976,858, filed October 12, 2001, USSN 60/295,917, filed June 4, 2001, USSN 

60/368,689, filed March 29, 2002; USSN 60/350,666, filed November 13, 2001; and USSN 
60/372,246, filed April 12, 2002. 

Table 1 A provides unigene cluster identification numbers for the nucleotide sequence 
of genes that exhibit increased or decreased expression in androgen-independent prostate 

30 cancer samples. Table 1 A also provides an exemplar accession number that provides a 

nucleotide sequence that is part of the unigene cluster. The expression patterns of the genes 
of Table lA can be broadly defined into the following categories: 



wo 02/098358 



PCT/US02/17594 



Genes that are expressed early in the time course, then drop off in expression, and then 
express again with emergence of androgen-independence (hi-lo-lii pattern in table 1 A). 
Genes that are expressed early in the time course, then drop off in expression, and do not 
express again with emergence of androgen-independence (hi-lo-lo pattern in lA), Genes that 
5 are not expressed early in tlie time course, but express only with emergence of androgen- 
independence (lo-lo-hi pattern in table 1 A). Genes that are not expressed early in the time 
course, but then express as androgen is withdrawn and continue to express with emergence of 
androgen-independence (lo-hi-hi pattern in table 1 A). Genes that are not expressed early in 
the time course, but then express as androgen is withdrawn and drop off again with 

1 0 emergence of androgen-independence (lo-hi-lo pattern in table 1 A). 

Tables 2A-C provide unigene cluster identification numbers for the nucleotide 
sequence of genes that exhibit increased or decreased expression in androgen-dependent 
prostate cancer, prostate cancer undergoing androgen withdrawal and androgen-independent 
prostate cancer. Tables 2A-C also provide an exemplar accession number that provides a 

1 5 nucleotide sequence that is part of the unigene cluster. The expression patterns of the genes 
of Tables 2A-C can be broadly defined into the following 6 categories: 

Genes that arc expressed early in the time course of androgen withdrawal, then drop 
off in expression, and then express again with emergence of androgen-independence (hi-lo- 
lo-hi pattern in Table 2A). Genes that are expressed early in the time course, then drop off in 

20 expression immediately after androgen-withdrawal, and do not express again with emergence 
of androgen-independence (hi-lo-lo-lo pattern in Table 2A). Genes that are expressed early 
in the time course, then drop off in expression after several days of androgen withdrawal, and 
do not express again with emergence of androgen-independence (hi-hi-lo-lo pattern in Table 
2A). Genes that are not expressed early in the time course, but express only with emergence 

25 of androgen-independence (lo-lo-lo-hi pattern in Table 2A). Genes that are not expressed 
early in the time course, but then express as androgen is withdrawn and continue to express 
with emergence of androgen-independence (lo-lo-hi-hi pattern in Table 2A). Genes that are 
not expressed early in tiie time course, but then express as androgen is withdrawn and drop 
off again with emergence of androgen-independence (lo-lo-hi-lo pattern in Table 2A). 

30 

Definitions 



9 
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The teim "androgen ablation therapjr" refers to techniques for the removal or 
destruction of sources of male hormones, such as testosterone. These techniques include, for 
example, 1) surgical removal of the testicles, 2) medications such as gonadatropin releasing 
hormone analogs that inhibit testosterone production, or 3) anti-androgenic drugs that block 
5 androgen receptors. 

The term "androgen-independent prostate cancer proteia" or "androgen-independent 
prostate cancer polynucleotide" or "androgen-indepeadent prostate cancer-associated 
transcript" refers to nucleic acid and polypeptide polymorphic variants, alleles, mutants, and 
interspecies homologues that: (1) have a nucleotide sequence that has greater than about 60% 

10 nucleotide sequence identity, 65%, 70%, 75%, 80%, 85%, 90%, preferably 91%, 92%, 93%, 
94%, 95%, 96%, 97%, 98% or 99%i or greater nucleotide sequence identity, preferably over a 
region of over a region of at least about 25, 50, 100, 200, 500, 1000, or more nucleotides, to a 
nucleotide sequence of or associated with a unigene cluster of Tables 1 A-4; (2) bind to 
antibodies, e.g., polyclonal antibodies, raised against an immunogen comprising an amino 

15 acid sequence encoded by a nucleotide sequence of or associated with a unigene cluster of 
Tables lA-4 and conservatively modified variants thereof; (3) specifically hybridize under 
stringent hybridization conditions to a nucleic acid sequence, or the complement thereof of 
Tables lA-4 and conservatively modified variants thereof; or (4) have an amino acid 
sequence that has greater than about 60% amino acid sequence identity, 65%, 70%, 75%, 

20 80%, 85%, 90%, preferably 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% or greater 
amino sequence identity, preferably over a region of over a region of at least about 25, 50, 
100, 200, 500, 1000, or more amino add, to an amino add sequence encoded by a nucleotide 
sequence of or associated with a unigene cluster of Tables lA-4. These polynucleotides or 
proteins may also be expressed during a period following androgen withdrawal. A 

25 polynucleotide or polypeptide sequence is typically &om a mammal including, but not 

limited to, primate, e.g., human; rodent, e.g., rat, mouse, hamster; cow, pig, horse, sheep, or 
other mammal. A "prostate cancer polypeptide" and a "prostate cancer polynucleotide," 
include both naturally occurring or recombinant forms, and may refer to those polypeptides 
or polynucleotides which are expressed in prostate proUferative cells. 

30 A "full length" prostate cancer protein or nucleic acid refers to a prostate cancer 

polypeptide or polynucleotide sequence, or a variant thereol^ that contains the elements 
normally contained in one or more naturally occurring, wild type prostate cancer 
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polynucleotide or polypeptide sequences. The "full length" may be prior to, or after, various 
stages of post-translation processing or splicing, including alternative splicing. 

"Biological sample" as used herein is a sample of biological tissue or fluid that 
contains nucleic acids or polypeptides, e.g., of a prostate cancer protein, polynucleotide or 
5 transcript. Such samples include, but are not limited to, tissue isolated from primates, e.g., 
humans, or rodents, e.g., mice, and rats. Biological samples may also include sections of 
tissues such as biopsy and autopsy samples, frozen sections taken for histology purposes, 
blood, plasma, serum, sputum, stool, tears, mucus, hair, skin, etc. Biological samples also 
include explants and primary and/or transformed cell cultures derived from patient tissues. A 

1 0 biological sample is typically obtained from a eukaryotic organism, most preferably a 

mammal such as a primate e.g., chimpanzee or human; cow; dog; cat; a rodent, e.g., guinea 
pig, rat, mouse; rabbit; or a bird; reptile; or fish. 

"Providing a biological sample" means to obtain a biological sample for use in 
methods described in this invention. Most often, this will be done by removing a sample of 

1 5 cells from an animal, but can also be accomplished by using previously isolated cells (e. g. , 
isolated by another person, at another time, and/or for another purpose), by collecting a 
sample which contains a soluble polypeptide or nucleic acid derived from a prostate cell, or 
by performing the methods of the invention in vivo. Archival tissues, having treatment or 
outcome history, will be particularly useful. 

20 The terms "identical" or percent "identity," in the context of two or more nucleic 

acids or polypeptide sequences, refer to two or more sequences or subsequences that are the 
same or have a specified percentage of amino acid residues or nucleotides that are the same 
(i.e., about 60% identity, preferably 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 
96%, 97%, 98%, 99%, or higher ideatity over a specified region, when compared and aligned 

25 for maximum correspondence over a comparison window or designated region) as measured 
using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters 
described below, or by manual aligmnent and visual inspection (see, e.g., NCBI web site 
http://www.ncbi.nlm.nih.gov/BLAST/ or the like). Such sequences are then said to be 
"substantially identical." This definition also refers to, or may be applied to, the compliment 

30 of a test sequence. The definition also includes sequences that have deletions and/or 
additions, as well as those that have substitutions, as well as naturally occurring, e.g., 
polymorphic or allelic variants, and man-made variants. As described below, the preferred 
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algorithms can account for gaps and the like. Preferably, identity exists over a region that is 
at least about 25 amino acids or nucleotides in length, or more preferably over a region that is 
50-100 amino acids or nucleotides in length. 

For sequence comparison, typically one sequence acts as a reference sequence, to 
5 which test sequences are compared. When using a sequence comparison algoritlmi, test and 
reference sequences are entered into a computer, subsequence coordinates are designated, if 
necessary, and sequence algorithm program parameters are designated. Preferably, default 
program parameters can be used, or alternative parameters can be designated. The sequence 
comparison algorithm then calculates the percent sequence identities for the test sequences 

10 relative to the reference sequence, based on the program parameters. 

A "comparison window", as used herein, includes reference to a segment of one of the 
number of contiguous positions selected fix)m the group consisting typically of fix)m 20 to 
600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence 
maybe compared to a reference sequence of the same number of contiguous positions after 

15 the two sequences are optimally aligned. Methods of alignment of sequences for comparison 
are well-known in the art. Optimal alignment of sequences for comparison can be conducted, 
e.g., by the local homology algoritlim of Smith and Watennan (1981) Appl. Math. 2:482, by 
the homology aUgnment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48:443- 
453, by the search for similarity method of Pearson and Lipman (1988) Proc. Nafl. Acad. 

20 Sci. USA 85:2444-2448, by computerized implementations of these algorithms (GAP, 
BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics 
Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual 
inspection (see, e.g., Ausubel, et al, (eds. 1995 and supplements) Current Protocols in 



identity and sequence similarity include the BLAST and BLAST 2.0 algorithms, which are 
described in Altschul, et al. (1977) Nuc. Acids Res. 25:3389-3402 and Altschul, et al. (1990) 
J. Mol. Biol. 215:403-410. BLAST and BLAST 2.0 are used, with the parameters described 

herein, to determine percent sequence identity for the nucleic acids and proteins of the 
30 invention. Software for performing BLAST analyses is publicly available through the 
National Center for Biotechnology Information (http://www.ncbi.nhn.nih.gov/). This 
algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short 



25 




ilar Biology Lippincott). 
Preferred examples of algorithms that are suitable for determining percent sequence 
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words of length W in the query sequence, which either match or satisfy some positive-valued 
threshold score T when aligned with a word of the same length in a database sequence. T is 
referred to as the neighborhood word score threshold (Altschul, et al., supra). These initial 
neighborhood word hits act as seeds for initiating searches to find longer HSPs containing 
5 them. The word hits are extended in both directions along each sequence for as far as the 
cumulative alignment score can be increased. Cumulative scores are calculated using, e.g., 
for nucleotide sequences, the parameters M (reward score for a pair of matching residues; 
always > 0) and N (penalty score for mismatching residues; always < 0). For amino acid 
sequences, a scoring matrix is used to calculate the cumxilative score. Extension of the word 

10 hits in each direction are halted when: the cumulative alignment score falls oflfby the 

quantity X fix)m its maximum achieved value; the cumulative score goes to zero or below, 
due to the accumulation of one or more negative-scoring residue alignments; or the end of 
either sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) 

1 5 uses as defaults a wordlength (W) of 1 1 , an expectation (E) of 1 0, M=5, N=-4 and a 
comparison of both strands. For amino acid sequences, the BLASTP program uses as 
defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix 
(see Henilcoff and Henikoff (1989) Proc. Natl. Acad. Sci. USA 89:10915-919) aUgnments (B) 
of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands. 

20 The BLAST algorithm also performs a statistical analysis of the similarity between 

two sequences (see, e.g., KarUn and Altschul (1993) Proc. Nat'l. Acad. Sci. USA 90:5873- 
5787). One measure of similarity provided by the BLAST algorithm is the smallest sum 
probability (P(N)), which provides an indication of the probabihty by which a match between 
two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid 

25 is considered similar to a reference sequence if the smallest sum probability in a comparison 
of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably 
less than about 0.01, and most preferably less than about 0.001. Log values maybe large 
negative numbers, e.g., 5, 10, 20, 30, 40, 40, 70, 90, 110, 150, 170, etc. 

An indication that two nucleic acid sequences or polypeptides are substantially 

30 identical is that the polypeptide encoded by the first nucleic acid is immunologically cross 
reactive with the antibodies raised against the polypeptide encoded by the second nucleic 
acid, as described below. Thus, a polypeptide is typically substantially identical to a second 
13 
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polypeptide, e.g., where tlie two peptides differ only by conservative substitutions. Another 
indication that two nucleic acid sequences are substantially identical is that the two molecules 
or their complements hybridize to each other under stringent conditions, as described below. 
Yet another indication that two nucleic acid sequences are substantially identical is tliat the 
5 same primers can be used to ampUfy the sequences, 

A "host cell" is a naturally occurring cell or a transformed cell that contains an 
expression vector and supports the replication or expression of the expression vector. Host 
cells may be cultured cells, explants, cells in vivo, and the like. Host cells may be 
prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, insect, amphibian, or 

1 0 mammalian cells such as CHO, HeLa, and the like (see, e.g., the American Type C\ilture 
Collection catalog or web site, www.atcc.org). 

The terms "isolated," "purified," or "biologically pure" refer to material that is 
substantially or essentially free from components that normally accompany it as found in its 
native state. Purity and homogeneity are typically determined using analytical chemistry 

1 5 techniques such as polyacrylamide gel electrophoresis or high performance Uquid 

chromatography. A protein or nucleic acid that is the predominant species present in a 
preparation is substantially purified. In particular, an isolated nucleic acid is separated from 
some open reading frames that naturally flank the gene and encode proteins other than protein 
encoded by tlie gene. The term "purified" in some embodiments denotes that a nucleic acid 

20 or protein gives rise to essentially one band in an electrophoretic gel. Preferably, it means 
that the nucleic acid or protein is at least 85% pure, more preferably at least 95% pure, and 
most preferably at least 99% pure. "Purify" or "purification" in other embodiments means 
removing at least one contaminant from the composition to be purified. In this sense, 
purification does not require that the purified compound be homogenous, e.g., 100% piire. 

25 The terms "polypeptide," "peptide," and "protein" are used interchangeably herein to 

refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which 
one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally 
occurring amino acid, as well as to naturally occurring amino acid polymers, those containing 
modified residues, and non-naturaUy occurring amino acid polymer. Certain diagnostic 

30 methods may evaluate secreted or breakdown products present only because the producing 
cell is present, and would otherwise be absent in a normal individual. 
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The term "amino acid" refers to naturally occurring and synthetic amino acids, as well 
as amino acid analogs and amino acid mimetics that function similarly to the naturally 
occurring amino acids. Naturally occurring amino adds are those encoded by the genetic 
code, as well as those amino acids that are later modified, e.g., hydroxyproline, y- 
5 carboxyglutamate, and 0-phosphoserine. Amino acid analogs refers to compounds that have 
the same basic chemical structure as a naturally occurring amino acid, e.g., an a carbon that is 
bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, 
norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs may have 
modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic 

10 chemical structure as a naturally occurring amino acid. Amino acid mimetics refers to 
chemical compounds that have a structure that is different fixjm the general chemical 
structure of an amino acid, but that functions similarly to a naturally occurring amino acid. 

Amino acids may be referred to herein by either their commonly known three letter 
symbols or by the one-letter symbols recommended by the lUPAC-IUB Biochemical 

1 5 Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly 
accepted single-letter codes. 

"Conservatively modified variants" applies to both ammo acid and nucleic acid 
sequences. With respect to particular nucleic acid sequences, conservatively modified 
variants refers to those nucleic acids which encode identical or essentially identical amino 

20 acid sequences, or where the nucleic acid does not encode an amino acid sequence, to 
essentially identical or associated, e.g., naturally contiguous, sequences.. Because of the 
degeneracy of the genetic code, a large number of functionally identical nucleic acids encode 
most proteins. For instance, the codons GCA, GCC, GCG, and GCU encode the amino acid 
alanine. Thus, at every position where an alanine is specified by a codon, the codon can be 

25 altered to another of the corresponding codons described without altering the encoded 

polypeptide. Such nucleic acid variations are "silent variations," which are one species of 
conservatively modified variations. Every nucleic acid sequence herein which encodes a 
polypeptide also describes silent variations of the nucleic acid. One of skill will recognize 
that in certain contexts each codon in a nucleic acid (except AUG, which is ordinarily the 

30 only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can 
be modified to yield a functionally identical molecule. Accordingly, often silent variations of 
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a nucleic acid which encodes a polypeptide is imphcit in a described sequence with respect to 
the expression product, but not with respect to actual probe sequences. 

As to amino acid sequences, one of skill will recognize that individual substitutions, 
deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which 
5 alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded 
sequence is a "conservatively modified variant" where the alteration results in the substitution 
of an amino acid with a chemically similar amino acid. Conservative substitutions providing 
fimctionally similar amino acids are well known in the art. Such conservatively modified 
variants are in addition to and do not exclude polymorphic variants, interspecies homologs, 

10 and alleles of the invention, typically conservative substitutions for one another: 1) Alanine 
(A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 
4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 6) 
Phenylalanine (F), Tyrosine (Y), Tryptophan (W); 7) Serine (S), Threonine (T); and 8) 
Cysteine (C), Methionine (M) (see, e.g., Creighton (1984) Proteins Freeman). 

1 5 Macromolecular structures such as polypeptide structiures can be described in terms of 

various levels of organization. For a general discussion of this organization, see, e.g., 
Alberts, et al. (2001) Molecular Biology of the Cell (4th ed.) and Cantor and Schimmel 
(1980) Biophysical Chemistry Part 1: The Conformation of Biological Macromolecules 
Freeman. "Primary structure" refers to the amino acid sequence of a particular peptide. 

20 "Secondary structure" refers to locally ordered, three dimensional structures witliia a 

polypeptide. These structures are commonly known as domains. Domains are portions of a 
polypeptide that often form a compact unit of the polypeptide and are typically 25 to 
approximately 500 amino acids long. Typical domains are made up of sections of lesser 
organization such as stretches of p-sheet and a-helices. "Tertiary structure" refers to the 

25 complete three dimensional structure of a polypeptide monomer. "Quaternary structure" 
refers to the three dimensional structure formed, usually by the noncovalent association of 
independent tertiary units. Anisotropic terms are also known as energy terms. 

"Nucleic acid" or "oligonucleotide" or "polynucleotide" or grammatical equivalents 
used herein means at least two nucleotides covalently Unked together. Oligonucleotides are 

30 typically firom about 5, 6, 7, 8, 9, 10, 12, 15, 25, 30, 40, 50 or more nucleotides in length, up 
to about 100 nucleotides in length. Nucleic acids and polynucleotides are a polymers of 
virtually any length, includmg longer lengths, e.g., 200, 300, 500, 1000, 2000, 3000, 5000, 
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7000, 10,000, etc. A nucleic acid of the present invention will generally contain 
phosphodiester bonds, although in some cases, nucleic acid analogs are included that may 
have alternate backbones, comprising, e.g., phosphoramidate, phosphorothioate, 
phosphorodithioate, or 0-methylphosphoroamidite linkages (see Eckstein (1992) 
5 Oligonucleotides and Analogues: A Practical Approach . Oxford University Press); and 
. peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with 
positive backbones; non-ionic backbones, and non-ribose backbones, includiag those 
described in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC 
Symposium Series 580, Sanghvi and Cook (eds. 1994) Carbohydrate Modifications in 

10 Antisease Research ACS Symposium Series 580. Nucleic acids containing one or more 

carbocyclic sugars are also included within one definition of nucleic acids. Modifications of 
the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the 
stabiKty and half-life of such molecules in physiological environments or as probes on a 
biochip. Mixtures of naturally occurring nucleic acids and analogs can be made; 

1 5 alternatively, mixtures of different nucleic acid analogs, and mixtures of natur^y occurring 
nucleic acids and analogs may be made. 

A variety of references disclose such nucleic acid analogs, including, for example, 
phosphoramidate (Beaucage, et al. (1993) Tetrahedi'on 49(10):1925-1963 and references 
therein; Letsinger (1970) J. Org. Chem. 35:3800-3803; Sprinzl, et al. (1977) Eur. J. Biochem. 

20 81:579-589; Letsinger, et al. (1986) Nucl. Acids Res. 14:3487-499; Sawai, et al (1984) 

Chem. Lett. 805, Letsinger, et al. (1988) J. Am. Chem. Soc. 110:4470-4471; and Pauwels, et 
al. (1986) Chemica Scripta 26:141-149), phosphorothioate (Mag, et al. (1991) Nucleic Acids 
Res. 19:1437-441; and U.S. Patent No. 5,644,048), phosphorodithioate (Briu, et al. (1989) L 
Am. Chem. Soc . 1 1 l:2321-xxx, O-methylphosphoroamidite linkages (see Eckstein (1992) 

25 Oligonucleotides and Analogues: A Practical Approach Oxford University Press), and 
peptide nucleic acid backbones and linkages (see Eghohn (1992) J. Am. Chem. Soc. 
1 14:1895-1897; Meier, et al. ri992;) Chem. Int. Ed. Engl. 31:1008-1010: Nielsen (1993) 
Nature 365:566-568; Carlsson, et al. (1996) Nature 380:207, each of which is incorporated by 
reference). Other analog nucleic acids include those with positive backbones (Denpcy, et al. 

30 (1995) Proc. Natl. Acad. Sci. USA 92:6097-101; non-ionic backbones (U.S. Patent Nos. 

5,386,023, 5,637,684, 5,602,240, 5,216,141 and 4,469,863; Kiediowshi, et al. (1991) Angew. 
Chem. Intl. Ed. Enehsh 30:423-426; Letsinger, et al. (1988) J. Am. Chem. Soc. 110:4470; 
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Letsinger, et al. (1994) Nucleoside and Nucleotide 13:1597-xxx; Chapters 2 and 3 in 
Sanghvi and Cook (ads. 1994) Carbohydrate Modifications in Antisense Research ACS 
Symposium Series 580; Mesmaeker, et al. (1994) Bioorganic and Medicinal Chem. Lett. 
4:395-xxx; Jeffs, et al. (1994) J. Biomolecular NMR 34:17; Horn (1996) Tetrahedron Lett. 
5 37:743-xxx) and non-ribose backbones, including those described in U.S. Patent Nos. 
5,235,033 and 5,034,506, and Chapters 6 and 7 in Sanghvi and Cook (eds. 1994) 
Carbohydrate Modifications in Antisense Research ACS Symposium Series 580. Nucleic 
acids contaimng one or more carbocyclic sugars are also included within one definition of 
nucleic acids (see Jenkins, et al. (1995) Chem. Soc. Rev. xx:169-176). Several nucleic acid 

10 analogs are described in Rawls (p. 35, June 2, 1997) C&E News . Each of these references is 
hereby expressly incorporated by reference. 

Particularly preferred are peptide nucleic acids (PNA) which includes peptide nucleic 
acid analogs. These backbones are substantially non-ionic under neutral conditions, in 
^ contrast to the highly charged phosphodiester backbone of naturally occurring nucleic acids. 

1 5 This results in two advantages. First, the PNA backbone exhibits improved hybridization 
kinetics. PNAs have larger changes in the melting temperature (Tm) for mismatched versus 
perfectly matched base pairs. DNA and RNA typically exhibit a 2-4° C drop in Tm for an 
internal mismatch. With the non-ionic PNA backbone, the drop is closer to 7-9° C. 
Similarly, due to their non-ionic nature, hybridization of the bases attached to these 

20 backbones is relatively insensitive to salt concentration. In addition, PNAs are not degraded 
by cellular enzymes, and thus can be more stable. 

The nucleic acids may be single stranded or double stranded, as specified, or contain 
portions of both double stranded or single stranded sequence. As will be appreciated by those 
in the art, the depiction of a single strand also defines the sequence of the complementary 

25 strand; thus the sequences described herein also provide the complement of the sequence. 

The nucleic acid may be DNA, both genomic and cDNA, RNA or a hybrid, where the nucleic 
acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of 
bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, 
isocj^osine, isoguanine, etc. "Transcript" typically refers to a naturally occurring RNA, e.g., 

30 a pre-mRNA, hnRNA, or mRNA. As used herein, the term "nucleoside" includes nucleotides 
and nucleoside and nucleotide analogs, and modified nucleosides such as amino modified 
nucleosides. In addition, "nucleoside" includes non-naturally occurring analog structures. 
18 
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Thus, e.g., the individual units of a peptide nucleic acid, each containing a base, are referred 
to herein as a nucleoside. 

A "label" or a "detectable moiety" is a composition detectable by spectroscopic, 
photochemical, biochemical, immunochemical, chemical, or other physical means. For 
5 example, useful labels include ^^P, fluorescent dyes, electron-dense reagents, enzymes (e.g., 
as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins or other entities 
which can be made detectable, e.g., by incorporating a radiolabel into the peptide or used to 
detect antibodies specifically reactive with tlie peptide. The labels may be incorporated into 
the prostate cancer nucleic acids, proteins, and antibodies at virtually any position. Many 

10 methods for conjugating the antibody to the label may be employed, including those methods 
described by Hunter, et al. (1962) Nature. 144:945; David, et al. (1974) Biochemistry 
13:1014-1021; Pain, et al. (1981) J. Immunol. Meth. 40:219-230; and Nygren (1982) L 
Histochem. and Cvtochem. 30:407-412. 

An "effector" or "effector moiety" or "effector component" is a molecule that is 

1 5 bound (or linked, or conjugated), either covalently, through a linker or a chemical bond, or 
noncovalently, through ionic, van der Waals, electrostatic, or hydrogen bonds, to an antibody. 
The "effector" can be a variety of molecules including, e.g., detection moieties including 
radioactive compounds, fluorescent compounds, an enzyme or substrate, tags such as epitope 
tags, a toxin; activatable moieties, a chemotherapeutic agent; a lipase; an antibiotic; or a 

20 radioisotope emitting "hard" e.g., beta radiation. 

A "labeled nucleic acid probe or oligonucleotide" is one that is bound, either 
covalently, through a linker or a chemical bond, or noncovalently, through ionic, van der 
Waals, electrostatic, or hydrogen bonds to a label such that the presence of the probe may be 
detected by detecting the presence of the label botmd to the probe. Alternatively, method 

25 using high affinity interactions may achieve the same results where one of a pair of binding 
partners binds to the other, e.g., biotin, streptavidin. 

As used herein a "nucleic acid probe or oligonucleotide" is defined as a nucleic acid 
capable of binding to a target nucleic acid of complementary sequence through one or more 
types of chemical bonds, usually through complementary base pairing, usually toough 

30 hydrogen bond formation. As used herein, a probe may include natural (i.e., A, G, C, or T) or 
modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in a probe may be 
joined by a linkage other than a phosphodiester bond, so long as it does not functionally 
19 
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interfere with hybridization. Thus, e.g., probes may be peptide nucleic acids in which the 
constituent bases are joined by peptide bonds rather than phosphodiester linkages. It will be 
understood by one of skill in the art that probes may bind target sequences lacking complete 
complementarity with the probe sequence depending upon the stringency of the hybridization 
5 conditions. The probes are preferably directly labeled as with isotopes, cliromophores, 
lumiphores, cliromogens, or indirectly labeled such as with biotin to which a streptavidin 
complex may later bind. By assaying for the presence or absence of the probe, one can detect 
the presence or absence of the select sequence or subsequence. Diagnosis or prognosis may 
be based at the genomic level, or at the level of RNA or protein expression. 

10 The term "recombinant" when used with reference, e.g., to a cell, or nucleic acid, 

protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by 
the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic 
acid or protein, or that the cell is derived firom a cell so modified. Thus, e.g., recombinant 
cells e5q)ress genes that are not found within the native (non-recombinant) form of the cell or 

1 5 express native genes that are otherwise abnormally expressed, under expressed or not 
expressed at all. By the term "recombinant nucleic acid" herein is meant nucleic acid, 
originally formed in vitro, in general, by the manipulation of nucleic acid, e.g., using 
polymerases and endonucleases, in a form not normally found in nature. In this manner, 
operably linkage of different sequences is achieved. Thus an isolated nucleic acid, in a linear 

20 form, or an expression vector formed in vitro by ligating DNA molecules that are not 

normally joined, are both considered recombinant for the purposes of this invention. It is 
understood that once a recombinant nucleic acid is made and reintroduced into a host cell or 
organism, it will replicate non-recombinantly, i.e., using the in vivo cellular machinery of the 
host cell rather than in vitro manipulations; however, such nucleic acids, once produced 

25 recombinantly, although subsequently replicated non-recombinantly, are still considered 

recombinant for the purposes of the invention. Similarly, a "recombinant protein" is a protein 
made using recombinant techniques, i.e., tiirough the expression of a recombinant nucleic 
acid as depicted above. 

The term "heterologous" when used with reference to portions of a nucleic acid 

30 indicates that the nucleic acid comprises two or more subsequences that are not normally 
found in the same relationship to each other in nature. For instance, the nucleic acid is 
typically recombinantly produced, having two or more sequences, e.g., Scorn unrelated genes 
20 
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arranged to make a new ftinctional nucleic acid, e.g., a promoter from one source and a 
coding region from another source. Similarly, a heterologous protein will often refer to two 
or more subsequences that are not found in the same relationship to each other in nature (e.g., 
a fusion protein). 

5 A "promoter" is defined as an array of nucleic acid control sequences that direct 

transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic acid 
sequences near the start site of transcription, such as, in the case of a polymerase 11 type 
promoter, a TATA element. A promoter also optionally includes distal enhancer or repressor 
elements, which can be located as much as several thousand base pairs from the start site of 

1 0 transcription. A "constitutive" promoter is a promoter that is active under most 

environmental and developmental conditions. An "inducible" promoter is a promoter that is 
active under environmental or developmental regulation. The term "operably linked" refers 
to a ftinctional linkage between a nucleic acid expression control sequence (such as a 
promoter, or array of transcription factor binding sites) and a second nucleic acid sequence, 

1 5 wherein the expression control sequence directs transcription of the nucleic acid 
corresponding to the second sequence. 

An "expression vector" is a nucleic acid construct, generated recombinantly or 
synthetically, with a series of specified nucleic acid elements that permit transcription of a 
particular nucleic acid in a host cell. The expression vector can be part of a plasmid, virus, or 

20 nucleic acid fi-agment. Typically, the expression vector includes a nucleic acid to be 
transcribed operably linked to a promoter. 

The phrase "selectively (or specifically) hybridizes to" refers to the binding, 
duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under 
stringent hybridization conditions when that sequence is present in a complex mixture (e.g., 

25 total cellular or library DNA or RNA). 

The phrase "stringent hybridization conditions" refers to conditions under which a 
probe wiU hybridize to its target subsequence, typically in a complex mixture of nucleic 
acids, but to no other sequences. Stringent conditions are sequence-dependent and will be 
different in different curcumstances. Longer sequences hybridize specifically at higher 

30 temperatures. An extensive guide to the hybridization of nucleic acids is found "Overview of 
principles of hybridization and the strategy of nucleic acid assays" in Tijssen (1993) 
Hybridization with Nucleic Probes (Techniques in Biochemistry and Molecular Biology vol. 
21 
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24} Elsevier. Generally, stringent conditions are selected to be about 5-10° C lower than the 
thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm is 
the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% 
of the probes complementary to the target hybridize to the target sequence at equilibrium (as 
5 the target sequences are present in excess, at Tm, 50% of the probes are occupied at 

equilibrium). Stringent conditions will be those in which the salt concentration is less than 
about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other 
salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C for short probes (e.g., 10 to 
50 nucleotides) and at least about 60° C for long probes (e.g., greater than 50 nucleotides). 

1 0 Stringent conditions may also be achieved with the addition of destabilizing agents such as 
formamide. For selective or specific hybridization, a positive signal is at least two times 
background, preferably 10 times background hybridization. Exemplary stringent 
hybridization conditions can be as following: 50% formamide, 5x SSC, and 1% SDS, 
incubating at 42° C, or, 5x SSC, 1% SDS, incubating at 65° C, witii wash in 0.2x SSC, and 

15 0.1% SDS at 65" C. For PGR, a temperature of about 36° C is typical for low stringency 
amplification, althougli annealing temperatures may vary between about 32° C and 48° C 
depending on primer length. For high stiingency PGR amplification, a temperature of about 
62° C is typical, although high stringency annealing temperatures can range fiom about 50- 
65° C, depending on the primer length and specificity. Typical cycle conditions for both high 

20 and low stringency amplifications include a denaturation phase of 90-95° C for 30-120 sec, 
an annealing phase lasting 30-120 sec, and an extension phase of about 72° C for 1-2 min. 
Protocols and guidelines for low and high stringency amplification reactions are provided, 
e.g., in Innis, et al. (1990) PGR Protocols: A Guide to Methods and Applications Academic 
Press, N.Y. 

25 Nucleic acids that do not hybridize to each other under stringent conditions are still 

substantially identical if the polypeptides which they encode are substantially identical. This 
occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy 
permitted by the genetic code. In such cases, the nucleic acids typically hybridize under 
moderately stringent hybridization conditions. Exemplary "moderately stringent 

30 hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaGl, 
1% SDS at 37° G, and a wash in IX SSC at 45° C. A positive hybridization is at least twice 
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background. Those of ordinary skill wiU readily recognize that alternative hybridization and 
wash conditions can be utilized to provide conditions of similar stringency. Additional . 
guidelines for determining hybridization parameters are provided in numerous references, 
e.g., Ausubel, et al. (eds. 1991 and supplements) Current Protocols in Molecular Biology 
5 The phrase "functional effects" in the context of assays for testing compounds that 

modulate activity of a prostate cancer protein includes the determination of a parameter that 
is indirectly or directly under the influence of the prostate cancer protein or nucleic acid, e.g., 
a fimctional, physical, or chemical effect, such as the ability to decrease prostate prohferation 
(mahgnant or non-malignant). It includes ligand binding activity; cell growth on soft agar; 

10 anchorage dependence; contact inhibition and density limitation of growth; cellular 

proliferation; cellular transformation; growth fector or serum dependence; tumor specific 
marker levels; invasiveness into Matrigel; tumor growth and metastasis in vivo; mRNA and 
protein expression in cells undergoing metastasis, and other characteristics of prostate cancer 
cells. 'Tunctional effects" include in vitro, in vivo, and ex vivo activities. 

1 5 By "determining the functional effect" is meant assaying for a compound that 

increases or decreases a parameter that is indirectly or directly under the influence of a 
prostate cancer protein sequence, e.g., functional, enzymatic, physical and chemical effects. 
Such fimctional effects can be measured by means known to those skilled in the art, e.g., 
changes in spectroscopic characteristics (e.g., fluorescence, absorbance, refractive index), 

20 hydrodynamic (e.g., shape), chromatographic, or solubility properties for the protein, 
measuring inducible markers or transcriptional activation of the prostate cancer protein; 
measuring binding activity or binding assays, e.g., binding to antibodies or other ligands, and 
measuring cellular proliferation. Determination of the functional effect of a compound on 
prostate cancer can also be performed using prostate cancer assays known to those of skill in 

25 the art such as an in vitro assays, e.g., cell growth on sofl agar; anchorage dependence; 
contact inhibition and density limitation of growth; cellular proliferation; ceUular 
transformation; growth factor or serum dependence; tumor specific marker levels; 
invasiveness into Matrigel; tumor growth and metastasis in vivo; mKNA and protein 
expression in cells undergoing metastasis, and other characteristics of prostate cancer cells. 

30 The fimctional effects can be evaluated by many means known to those skilled in the art, e.g., 
microscopy for quantitative or quaUtative measures of alterations in morphological features, 
measurement of changes in RNA or protein levels for prostate cancer-associated sequences. 
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measurement of RNA stability, identification of downstream or reporter gene expression 
(CAT, luciferase, P-gal, GFP, and the Kke), e.g., via chemiluminescence, fluorescence, 
colorimetric reactions, antibody binding, inducible markers, and Ugand binding assays. 

"Inhibitors", "activators", and "modulators" of prostate cancer polynucleotide and 
5 polypeptide sequences are used to refer to activating, inliibitory, or modulating molecules or 
compounds identified using in vitro and in vivo assays of prostate cancer polynucleotide and 
polypeptide sequences. Inhibitors are compounds that, e.g., bind to, partially or totally block 
activity, decrease, prevent, delay activation, inactivate, desensitize, or down regulate the 
activity or expression of prostate cancer proteins, e.g., antagonists. Antisense nucleic acids 

1 0 may seem to inhibit expression and subsequent function of the protein. "Activators" are 

compounds that increase, open, activate, facilitate, enhance activation, sensitize, agonize, or 
up regulate prostate cancer protein activity. Inhibitors, activators, or modulators also include 
genetically modified versions of prostate cancer proteins, e.g., versions with altered activity, 
as well as naturally occurring and synthetic Ugands, antagonists, agonists, antibodies, small 

15 chemical molecules and the like. Such assays for inhibitors and activators include, e.g., 

expressing the prostate cancer protein in vitro, in cells, or cell membranes, applying putative 
modulator compounds, and then determining the fimctional effects on activity, as described 
above. Activators and inhibitors of prostate cancer can also be identified by incubating 
prostate cancer cells with the test compoimd and determming increases or decreases in the 

20 expression of 1 or more prostate cancer proteins, e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50 
or more prostate cancer proteins, such as prostate cancer proteins encoded by the sequences 
set out in Tables 1 A-4. 

Samples or assays comprising prostate cancer proteins that are treated with a potential 
activator, inhibitor, or modulator are compared to control samples without the inhibitor, 

25 activator, or modulator to examine the extent of inhibition. Control samples (untreated with 
inhibitors) are assigned a relative protein activity value of 100%. Inhibition of a polypeptide 
is achieved when the activity value relative to the control is about 80%, preferably 50%, more 
preferably 25-0%. Activation of a prostate cancer polypeptide is achieved when the activity 
value relative to the control (untreated with activators) is 1 10%, more preferably 150%, more 

3 0 preferably 200-500% (i.e., two to five fold higher relative to the control), more preferably 
1000-3000% higher. 
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The phrase "changes in cell growth" refers to a change in cell growth and 
proliferation characteristics in vitro or in vivo, such as cell viability, formation of foci, 
anchorage independence, semi-sohd or soft agar growth, changes in contact inhibition and 
density limitation of growth, loss of growth factor or serum requirements, changes in cell 
5 morphology, gaining or losing immortalization, gaining or losing tumor specific markers, 
ability to form or suppress tumors when injected into suitable animal hosts, and/or 
immortalization of the cell. See, e.g., pp. 231-241 in Freshney (1994) Culture of Animal 
Cells: A Manual of Basic Technique (3d ed.) Wiley-Liss. 

"Tumor cell" refers to precancerous, cancerous, and/or normal cells in a tumor. 

1 0 "Cancer cells," "transformed" cells, or "transformation" in tissue culture, refers to 

spontaneous or induced phenotypic changes that do not necessarily involve flie ijptake of new 
genetic material. Although transformation can arise from infection with a transforming virus 
and incorporation of new genomic DNA, or uptake of exogenous DNA, it can also arise 
spontaneously or following exposure to a carcinogen, thereby mutating an endogenous gene. 

1 5 Transformation is associated with phenotypic changes, such as immortalization of cells, 
aberrant growth control, nonmorphological changes, and/or malignancy. See, Freshney 
(2001) Culture of Animal Cells: A Manual of Basic Technique (4th ed.) Wiley-Liss. 

"Antibody" refers to a polypeptide comprising a Iratnework region fi-om an 
immunoglobulin gene or Ixagments tliereof that specifically binds and recognizes an antigen. 

20 The recognized immunoglobuhn genes include the kappa, lambda, alpha, gamma, delta, 

epsilon, and mu constant region genes, as well as the myiiad immunoglobulin vaiiable region 
genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as 
gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, 
IgM, IgA, IgD, and IgE, respectively. Typically, the antigen-binding region of an antibody or 

25 its functional equivalent will be most critical in specificity and affinity of binding. See Paul 
(ed. 1999) Fundamental Immunology (4th ed.) Raven. 

An exemplary immunoglobulin (antibody) structural unit comprises a tetramer. Each 
tetramer is composed of two identical pairs of polypeptide chains, each pair having one 
"light" (about 25 kD) and one "heavy" chain (about 50-70 kD). The N-terminus of each 

3 0 chain defines a variable region of about 1 00 to 1 1 0 or more amino acids primarily responsible 
for antigen recognition. The terms variable hght chain (Vl) and variable heavy chain (Vh) 
refer to these light and heavy chains respectively. 
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Antibodies exist, e.g., as intact immunoglobulins or as a number of well-characterized 
fragments produced by digestion with various peptidases. Thus, e.g., pepsin digests an 
antibody below the disulfide linkages in the hinge region to produce F(ab)'2, a dimer of Fab 
which itself is a light chain joined to Vh-Ch1 by a disulfide bond. The F(ab)'2 may be 
5 reduced under mild conditions to break the disulfide linkage in the hinge region, thereby 
converting the F(ab)'2 dimer into an Fab' monomer. The Fab' monomer is essentially Fab 
with part of the hinge region (see Paul (ed. 1993) Fundamental Immunology (3d ed.) Raven. 
While various antibody fragments are defined in terms of the digestion of an intact antibody, 
one of skill will appreciate that such fragments may be synthesized de novo either chemically 
10 or by using recombinant DNA methodology. Thus, the term antibody, as used herein, also 
includes antibody fi^gments either produced by the modification of whole antibodies, or 
those synthesized de novo using recombinant DNA methodologies (e.g., single chain Fv) or 
those identified using phage display libraries (see, e.g., McCaflferty, et al.(1990) Nature 
348:552-554. 

1 5 For preparation of antibodies, e.g., recombinant, monoclonal, or polyclonal 

antibodies, many technique known in the art can be used (see, e.g., Kohler and Milstein 
(1975) Nature 256:495-497; Kozbor, et al. (1983) hnmunology Today 4:72; pp. 77-96 in 
Cole, et al. (1985) Monoclonal Antibodies and Cancer Therapv Liss; Coligan (1991) Cuixent 
Protocols in Immunology Lippincott; Harlow and Lane (1988) Antibodies: A Laboratory 

20 Manual CSH Press; and Coding (1986) Monoclonal Antibodies: Principles and Practice (2d 
ed.) Academic Press. Techniques for the production of single chain antibodies (U.S. Patent 
4,946,778) can be adapted to produce antibodies to polypeptides of this invention. Also, 
transgenic mice, or other organisms such as other mammals, may be used to express 
humanized antibodies. Alternatively, phage display technology can be used to identify 

25 antibodies and heteromeric Fab fragments that specifically bind to selected antigens (see, e.g., 
McCaflferty, et al. (1990) Nature 348:552-554; Marks, et al. (1992) Biotechnology 10:779- 
783). 

A "chimeric antibody" is an antibody molecule in which (a) the constant region, or a 
portion thereof, is altered, replaced or exchanged so that the antigen binding site (variable 
30 region) is linked to a constant region of a diflFerent or altered class, effector function and/or 
species, or an entirely different molecule which confers new properties to the chimeric 
antibody, e.g., an enzyme, toxin, hormone, growth factor, drug, etc.; or (b) the variable 
26 
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region, or a portion thereoi^ is altered, replaced or exchanged with a variable region having a 
different or altered antigen specificity. 

Identification of prostate cancer-associated sequences 
5 In one aspect, the expression levels of genes are determined in different patient 

samples for which diagnosis information is desired, to provide expression profiles. An 
expression profile of a particular sample is essentially a "fmgerprint" of the state of the 
sample; while two states may have a particular gene similarly expressed, the evaluation of a 
number of genes simultaneously allows the generation of a gene expression profile that is 

10 characteristic of the state of the ceU. That is, normal tissue (e.g., normal prostate or other 
tissue) may be distinguished fix)m pathological prostate cells, e.g., cancerous or metastatic 
cancerous tissue of the prostate, or prostate cancer tissue or metastatic prostate cancerous 
tissue can be compared with tissue samples of prostate and other tissues from surviving 
cancer patients. By comparing expression profiles of tissue in known different prostate 

1 5 cancer states, information regarding which genes are important (including both up- and 
down-regulation of genes) in each of these states is obtained. 

The identification of sequences that are differentially expressed in prostate cancer 
versus non-prostate cancer tissue allows the use of this information in a number of ways. For 
example, a particular treatment regime may be evaluated: does a chemotherapeutic drug act 

20 to down-regulate prostate cancer or other proliferative disorders, and thus tumor growth or 
recurrence, in a particular patient. Alternatively, a treatment step may induce other markers 
which may be used as targets to destroy tumor cells. Similarly, diagnosis and treatment 
outcomes may be done or confirmed by comparing patient samples with the known 
expression profiles. Maliganant disease may be compared to non-maUgnant conditions. 

25 Metastatic tissue can also be analyzed to determine the stage of prostate cancer in the tissue, 
or origin of primary tumor, e.g., metastasis &om a remote primary site. Furthermore, these 
gene expression profiles (or individual genes) allow screening of drug candidates with an eye 
to mimicking or altering a particular expression profile; e.g., screening can be done for drugs 
that suppress the prostate cancer expression profile. This may be done by making biochips 

30 comprising sets of the important prostate cancer genes, which can then be used in these 
screens. These methods can also be done on the protein basis; that is, protein expression 
levels of the prostate cancer proteins can be evaluated for diagnostic purposes or to screen 
27 
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candidate agents. In addition, the prostate cancer nucleic acid sequences can be administered 
for gene therapy purposes, including the administration of antisense nucleic acids, or the 
prostate cancer proteins (including antibodies and other modulators thereof) administered as 
therapeutic drugs. 

5 Thus the present invention provides nucleic acid and protein sequences tliat are 

differentially expressed in prostate cancer relative to normal tissues and/or non-malignant 
disease, or in different types of related diseases, herein termed "prostate cancer sequences." 
As outlined below, prostate cancer sequences include those that are up-regulated (i.e., 
expressed at a higher level) in prostate cancer, as well as those that are down-regulated (i.e., 

10 expressed at a lower level). In a preferred embodiment, the prostate cancer sequences are 
from humans; however, as will be appreciated by those in the art, prostate cancer sequences 
from other organisms may be useful in animal models of disease and drug evaluation; thus, 
other prostate cancer sequences are provided, from vertebrates, including mammals, 
including rodents (rats, mice, hamsters, guinea pigs, etc.), primates, farm animals (including 

15 sheep, goats, pigs, cows, horses, etc.) and pets, e.g., (dogs, cats, etc.). Prostate cancer 
sequences from other organisms may be obtained using the techniques outlined below. 

Prostate cancer sequences can include both nucleic acid and amino acid sequences. 
As will be appreciated by those in the art and is more fully outlined below, prostate cancer 
nucleic acid sequences are useful in a variety of apphcations, including diagnostic 

20 applications, which will detect naturally occurring nucleic acids, as well as screening 
applications; e.g., biochips comprising nucleic acid probes or PGR microtiter plates with 
selected probes to the prostate cancer sequences can be generated. 

A prostate cancer sequence can be initially identified by substantial nucleic acid 
and/or ammo acid sequence homology to the prostate cancer sequences outlined herem. Such 

25 homology can be based upon the overall nucleic acid or amino acid sequence, and is 

generally determined as outlined below, using either homology programs or hybridization 
conditions. 

For identifying prostate cancer-associated sequences, the prostate cancer screen 
typically includes comparing genes identified in different tissues, e.g., normal and cancerous 
30 tissues, or tumor tissue samples from patients who have metastatic disease vs. non metastatic 
tissue. Other suitable tissue comparisons include comparing prostate cancer samples with 
metastatic cancer samples from other cancers, such as lung, breast, gastrointestinal cancers, 
28 
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ovarian, etc. Samples of different stages of prostate cancer, e.g., survivor tissue, drug 
resistant states, and tissue undergoing metastasis, are applied to biochips comprising nucleic 
acid probes. The samples are first microdissected, if applicable, and treated as is known in 
the art for the preparation of mRNA. Suitable biochips are commercially available, e.g., from 
5 Affymetrix. Gene expression profiles are generated and the data analyzed. 

In one embodiment, the genes showing changes in expression as between normal and 
disease states are compared to genes expressed in other normal tissues, preferably normal 
prostate, but also including, and not limited to lung, heart, brain, liver, breast, kidney, muscle, 
colon, small intestine, large intestine, spleen, bone, and placenta. In a preferred embodiment, 

1 0 those genes identified during the prostate cancer screen that are expressed in a significant 
amount in other tissues are removed from the profile, although in some embodiments, this is 
not necessary. That is, when screening for drugs, it is usually preferable that the target be 
disease specific, to minimize possible side ejffects on other organs were there expression. 

In a preferred embodunent, prostate cancer sequences are those that are up-regulated 

15 in prostate cancer or related conditions; that is, the expression of these genes is higher in the 
prostate cancer tissue as compared to non-cancerous tissue. "Up-regulation" as used herein 
often means at least about a two-fold change, preferably at least about a three fold change, 
with at least about five- fold or higher being preferred. Another embodiment is directed to 
sequences up-regulated in non-malignant conditions relative to normal. 

20 Unigene cluster identification numbers and accession numbers herein are for the 

GenBank sequence database and the sequences of the accession numbers are hereby 
expressly incorporated by reference. GenBank is known in the art, see, e.g., Benson, et al. 
(1998) Nucleic Acids Research 26:1-7 and http://wwwjicbi.nhn.nih.gov/. Sequences are also 
available in other databases, e.g., European Molecular Biology Laboratory (EMBL) and 

25 DNA Database of Japan (DDBJ). U.S. Patent AppUcation N. 09/687,576 and 09/976,858 (- 
001-3) further disclose related sequences, compositions, and methods of diagnosis and 
treatment of prostate cancer and related conditions and are hereby expressly incorporated by 
reference. 

In another preferred embodiment, prostate cancer sequences are those that are down- 
30 regulated in the prostate cancer; that is, the expression of tiiese genes is lower in prostate 
cancer tissue as compared to non-cancerous tissue. "Down-regulation" as used herein often 
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means at least about a two-fold change, preferably at least about a three fold change, with at 
least about five-fold or higher being preferred. 

Infonnatics 

5 The ability to identify genes that are over or under expressed in prostate cancer can 

additionally provide high-resolution, high- sensitivity datasets which can be used in the areas 
of diagnostics, therapeutics, drug development, pharmacogenetics, protein structure, 
biosensor development, and other related areas. For example, the expression profiles can be 
used in diagnostic or prognostic evaluation of patients with prostate cancer. Or as another 

10 example, subcellular toxicological information can be generated to better direct drug structure 
and activity correlation (see Anderson, Pharmaceutical Proteomics: Targets. Mechanism, and 
Function, paper presented at the IBC Proteomics conference, Coronado, CA (June 1 1-12, 
1998)). Subcellular toxicological information can also be utilized in a biological sensor 
device to predict the likely toxicological effect of chemical exposures and likely tolerable 

15 exposure thresholds (see U.S. Patent No. 5,811,231). Sunilar advantages accrue fijom 

datasets relevant to other biomolecules andbioactive agents (e.g., nucleic acids, saccharides, 
hpids, drugs, and the like). 

Thus, in another embodiment, the present invention provides a database that includes 
at least one set of assay data. The data contained in the database is acquired, e.g., using array 

20 analysis either singly or in a library format. The database can be in a form in which data can 
be maintained and transmitted, but is preferably an electronic database. The electronic 
database of the invention can be maintained on an electronic device allowing for the storage 
of and access to the database, such as a personal computer, but is preferably distributed on a 
wide area network, such as the World Wide Web. 

25 The focus of the present section on databases that include peptide sequence data is for 

clarity of illustration only. It will be apparent to those of skill in the art that similar databases 
can be assembled for assay data acquired using an assay of the invention. 

The compositions and methods for identifying and/or quantitating the relative and/or 
absolute abundance of a variety of molecular and macromolecular species fi"om a biological 

30 sample undergoing prostate cancer, i.e., the identification of prostate cancer-associated 

sequences described herein, provide an abundance of information, which can be correlated 
with pathological conditions, predisposition to disease, drug testing, therapeutic monitoring, 
30 
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gene-disease causal Imkages, identification of correlates of immunity and physiological 
status, among others. Although the data generated &om the assays of the invention is suited 
for manual review and analysis, in a preferred embodiment, prior data processing using high- 
speed computers is utilized. 
5 An array of methods for indexing and retrieving biomolecular information is known 

in the art. For example, U.S. Patents 6,023,659 and 5,966,712 disclose a relational database 
system for storing biomolecular sequence information in a manner that allows sequences to 
be catalogued and searched according to one or more protein function hierarchies. U.S. 
Patent 5,953,727 discloses a relational database having sequence records containing 

10 information in a format that allows a collection of partial-length DNA sequences to be 

catalogued and searched according to association with one or more sequencing projects for 
obtaining Ml-length sequences fi-om the collection of partial length sequences. U.S. Patent 
5,706,498 discloses a gene database retrieval system for making a retrieval of a gene 
sequence similar to a sequence data item in a gene database based on the degree of similarity 

15 between a key sequence and a target sequence. U.S. Patent 5,538,897 discloses a method 
using mass spectroscopy fragmentation patterns of peptides to identify amino acid sequences 
in computer databases by comparison of predicted mass spectra with experimentally-derived 
mass spectra using a closeness-of-fit measure. U.S. Patent 5,926,818 discloses a multi- 
dimensional database comprising a functionality for multi-dimensional data analysis 

20 described as on-line analj^iical processing (OLAP), which entails the consolidation of 

projected and actual data according to more than one consolidation path or dimension. U.S. 
Patent 5,295,261 reports a hybrid database structure in which the fields of each database 
record are divided mto two classes, navigational and informational data, with navigational 
fields stored in a hierarchical topological map which can be viewed as a tree structure or as 

25 the merger of two or more such tree structures. 

See also Mount, et al. (2001) Bioinformatics CSH Press; Durbin, et al. (eds. 1999) 
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids 
Cambridge Univ. Press; Baxevanis and Oeullette (eds., 1998) Bioinformatics: A Practical 
Guide to the Analysis of Genes and Proteins Wiley-Liss; Rashidi and Buehler (1999) 

30 Bioinformatics: Basic Applications in Biological Science and Medicine CRC Press; Setubal, 
et al. (eds. 1997) Introduction to Computational Molecular Biology Brooks/Cole; Misener 
and Krawetz (eds. 2000) Bioinformatics: Methods and Protocols Human Press; Higgins and 
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Taylor (eds. 2000) Bioinformatics: Sequence. Structure, and Databanks: A Practical 
Approach Oxford Univ. Press; Brown (2001) Bioinformatics: A Biologist's Guide to 
Biocomputing and the Intemet Eaton Pub; Han and Kamber (2000) Data Mining: Concepts 
and Techniques Kaufinann Pub.; and Watemian (1995) Introduction to Computational 
5 Biology: Maps. Sequences, and Genomes Chap and Hall. 

The present invention provides a computer database comprising a computer and 
software for storing in computer-retrievable form assay data records cross-tabulated, e.g., 
with data specifying the source of the target-containing sample from which each sequence 
specificity record was obtamed. 

10 In an exemplary embodiment, at least one of the sources of target-containing sample 

is from a control tissue sample known to be free of pathological disorders. In a variation, at 
least one of the sources is a known pathological tissue specimen, e.g., a neoplastic lesion or 
another tissue specimen to be analyzed for prostate cancer. In another variation, the assay 
records cross-tabulate one or more of the following parameters for each target species in a 

15 sample: (1) a unique identification code, which can include, e.g., a target molecular structure 
and/or characteristic separation coordinate (e.g., electrophoretic coordinates); (2) sample 
source; and (3) absolute and/or relative quantity of the target species present in tlie sample. 

The invention also provides for the storage and retrieval of a collection of target data 
in a computer data storage apparatus, which can include magnetic disks, optical disks, 

20 magneto-optical disks, DRAM, SRAM, SGRAM, SDRAM, RDRAM, DDR RAM, magnetic 
bubble memory devices, and other data storage devices, including CPU registers and on-CPU 
data storage arrays. Typically, the target data records are stored as a bit pattern in an array of 
magnetic domains on a magnetizable medium or as an array of charge states or transistor gate 
states, such as an array of cells in a DRAM device (e.g., each cell comprised of a transistor 

25 and a charge storage area, which may be on the transistor). In one embodiment, the invention 
provides such storage devices, and computer systems built therewith, comprising a bit pattern 
encoding a protein expression fingerprint record comprising unique identifiers for at least 10 
target data records cross-tabulated with target source. 

When the target is a peptide or nucleic acid, the invention preferably provides a 

30 method for identifying related peptide or nucleic acid sequences, comprising performing a 
computerized comparison between a peptide or nucleic acid sequence assay record stored in 
or retrieved from a computer storage device or database and at least one other sequence. The 
32 
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comparison can include a sequence analysis or comparison algorithm or computer program 
embodiment thereof (e.g., FASTA, TFASTA, GAP, BESTFIT) and/or the comparison may 
be of the relative amount of a peptide or nucleic acid sequence in a pool of sequences 
determined from a polypeptide or nucleic acid sample of a specimen. 
5 The invention also preferably provides a magnetic disk, such as an IBM-compatible 

(DOS, Windows, Windows95/98/2000, Windows NT, OS/2) or other format (e.g., Linux, 
SunOS, Solaris, AIX, SCO Unix, VMS, MV, Macintosli, etc.) floppy diskette or hard (fixed, 
Winchester) disk drive, comprising a bit pattern encoding data from an assay of the invention 
in a file format suitable for retrieval and processing in a computerized sequence analysis, 

1 0 comparison, or relative quantitation method. 

The invention also provides a network, comprising a plurality of computing devices 
linked via a data link, such as an Ethernet cable (coax or lOBaseT), telephone line, ISDN 
line, wireless network, optical fiber, or other suitable signal transmission medium, whereby at 
least one network device (e.g., computer, disk array, etc.) comprises a pattern of magnetic 

15 domains (e.g., magnetic disk) and/or charge domains (e.g., an array of DRAM cells) 
composing a bit pattern encoding data acquired fi-om an assay of the invention. 

The invention also provides a method for transmitting assay data that includes 
generating an electronic signal on an electronic communications device, such as a modem, 
ISDN terminal adapter, DSL, cable modem, ATM switch, or the like, wherein the signal 

20 includes (in native or encrypted format) a bit pattern encoding data from an assay or a 
database comprising a plurality of assay results obtained by the method of the invention. 

In a preferred embodiment, the invention provides a computer system for comparing a 
query target to a database containing an array of data structures, such as an assay result 
obtained by the method of the invention, and ranking database targets based on the degree of 

25 identity and gap weight to the target data. A central processor is preferably initialized to load 
and execute the computer program for alignment and/or comparison of the assay results. 
Data for a query target is entered into the central processor via an I/O device. Execution of 
the computer program results in the central processor retrieving the assay data from the data 
file, which comprises a binary description of an assay result. 

30 The target data or record and the computer program can be transferred to secondary 

memory, which is typically random access memory (e.g., DRAM, SRAM, SGRAM, or 
SDRAM). Targets are ranked according to the degree of correspondence between a selected 
33 
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assay characteristic (e.g., binding to a selected affinity moiety) and the same characteristic of 
the query target and results are output via an I/O device. For example, a central processor 
can be a conventional computer (e.g., Intel Pentium, PowerPC, Alpha, PA-8000, SPARC, 
MIPS 4400, MIPS 10000, VAX, etc.); a program can be a commercial or public domain 
5 molecular biology software package (e.g., UWGCG Sequence Analysis Software, Darwin); a 
data file can be an optical or magnetic disk, a data server, a memory device (e.g., DRAM, 
SRAM, SGRAM, SDRAM, EPROM, bubble memory, flash memory, etc.); an I/O device can 
be a terminal comprising a video display and a keyboard, a modem, an ISDN terminal 
adapter, an Ethernet port, a punched card reader, a magnetic strip reader, or other suitable I/O 
10 device. 

The invention also preferably provides the use of a computer system, such as that 
described above, which comprises: (1) a computer; (2) a stored bit pattern encoding a 
collection of peptide sequence specificity records obtained by the methods of the invention, 
which may be stored in the computer; (3) a comparison target, such as a query target; and (4) 
15 a program for alignment and comparison, typically with rank-ordering of comparison results 
on the basis of computed similarity values. 

Characteristics of prostate cancer-associated proteins 

Prostate cancer proteins of the present invention may be classified as secreted 

20 proteins, transmembrane proteins, or intracellular proteins. In one embodiment, the prostate 
cancer protein is an intracellular protein. Intracellular proteins may be found in the 
cytoplasm and/or in the nucleus, hitracellular proteins are involved in all aspects of cellular 
function and replication (including, e.g., signaling pathways); aberrant expression of such 
proteins often results in unregulated or disregulated cellular processes (see, e.g., Alberts (ed. 

25 1 994) Molecular Biology of the Cell (3d ed.) Garland. For example, many intracellular 

proteins have enzymatic activity such as protein kinase activity, protein phosphatase activity, 
protease activity, nucleotide cyclase activity, polymerase activity and the like. Intracellular 
proteins also serve as docking proteins that are involved in organizing complexes of proteins, 
or targeting proteins to various subcellular localizations, and are involved in maintairdng the 

30 structural integrity of organelles. 

An increasingly appreciated concept in characterizing proteins is the presence in the 
proteins of one or more structural motifs for which defined functions have been attributed, hx 
34 
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addition to tlie highly conserved sequences found in the enzymatic domain of proteins, highly 
conserved sequences have been identified in proteins that are involved in protein-protein 
interaction. For example, Src-homology-2 (SH2) domains bind tyrosine-phosphorylated 
targets in a sequence dependent manner. PTB domains, which are distinct from SH2 
5 domains, also bind tyrosine phosphorylated targets. SH3 domains bind to proline-rich 

targets. In addition, PH domains, tetratricopeptide repeats and WD domains to name only a 
few, have been shown to mediate protein-protein interactions. Some of these may also be 
involved in binding to phospholipids or other second messengers. As will be appreciated by 
one of ordinary skill in the art, these motifs can be identified on the basis of amino acid 

1 0 sequence; thus, an analysis of the sequence of proteins may provide insight into both the 
enzymatic potential of the molecule and/or molecules with which the protein may associate. 
One useful database is Pfam (protein femilies), which is a large collection of multiple 
sequence alignments and hidden Markov models covering many common protein domains. 
Versions are available via the internet from Washington University in St. Louis, the Sanger 

15 Center in England, and the Karolinska Listitute in Sweden (see, e.g., Bateman, et al. (2000) 
Nuc. Acids Res. 28:263-266; Sonnhammer, et al. (1997) Proteins 28:405-420; Bateman, et al. 
(1999) Nuc. Acids Res. 27:260-262; and Sonnhammer, et al. (1998) Nuc. Acids Res. 26:320- 
322. 

In another embodiment, the prostate cancer sequences are transmembrane proteins. 

20 Transmembrane proteins are molecules that span a phospholipid bilayer of a cell. They may 
have an intracellular domain, an extracellular domain, or both. The intracellular domains of 
such proteins may have a number of functions including those already described for 
intracellular proteins. For example, the intracellular domain may have enzymatic activity 
and/or may serve as a binding site for additional proteins. Frequently the infracellular 

25 domain of transmembrane proteins serves both roles. For example certain receptor tyrosine 
kinases have both protein kinase activity and SH2 domains, fri addition, autophosphorylation 
of tyrosines on the receptor molecule itself, creates binding sites for additional SH2 domain 
containing proteins. 

Transmembrane proteins may contain from one to many transmembrane domains. 
30 For example, receptor tyrosine kinases, certain cytokine receptors, receptor guanylyl cyclases 
and receptor serine/threonine protein kinases contain a single transmembrane domain. 
However, various other proteins including channels and adenylyl cyclases contain numerous 
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transmembrane domains. M^iy important cell surface receptors such as G protein coupled 
receptors (GPCRs) are classified as "seven transmembrane domain" proteins, as they contain 
7 membrane spanning regions. Characteristics of transmembrane domains include 
approximately 17 consecutive hydrophobic amino acids that may be followed by charged 
5 amino acids. Therefore, upon analysis of the amino acid sequence of a particular protein, the 
locaUzation and number of transmembrane domains within the protein may be predicted (see, 
e.g., PSORT web site http://psort.nibb.ac.jp/). Important transmembrane protein receptors 
include, but are not limited to the insulin receptor, insulin-like growth factor receptor, human 
growth hormone receptor, glucose transporters, transferrin receptor, epidermal growth factor 

1 0 receptor, low density lipoprotein receptor, epidermal growth factor receptor, Iqjtin receptor, 
and interleukin receptors, e.g., IL-1 receptor, IL-2 receptor, etc. 

The extracellular domains of transmembrane proteins are diverse; however, conserved 
motifs are found repeatedly among various extracellular domains. Conserved structure 
and/or functions have been ascribed to different extracellular motifs. Many extracellular 

1 5 domains are involved in binding to other molecules. In one aspect, extracellular domains are 
fovmd on receptors. Factors that bind the receptor domain include circulating ligands, which 
may be peptides, proteins, or small molecules such as adenosine and the like. For example, 
growth factors such as EGF, FGF, and PDGF are circulating growth factors that bind to their 
cognate receptors to initiate a variety of cellular responses. Other factors include cytokines, 

20 mitogenic factors, neurotrophic factors and the like. Extracellular domains also bind to cell- 
associated molecules, hi this respect, they mediate cell-cell interactions. Cell-associated 
Ugands can be tethered to the cell, e.g., via a glycosylphosphatidylinositol (GPI) anchor, or 
may themselves be transmembrane proteins. Extracellular domains also associate with the 
extracellular matrix and contribute to the maintenance of the cell structure. 

25 Prostate cancer proteins that are transmembrane are particularly preferred in the 

present invention as they are readily accessible targets for immunotherapeutics, as are 
described herein. In addition, as outlined below, transmembrane proteins can be also useful 
in imaging modalities. Antibodies may be used to label such readily accessible proteins in 
situ. Alternatively, antibodies can also label intracellular proteins, in which case samples are 

30 typically permeablized to provide access to intracellular proteins.. In addition, some 
membrane proteins can be processed to release a soluble protein, or to expose a residual 
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fragment. Released soluble proteins may be useful diagnostic markers, processed residual 
protein fragments may be useful prostate markers of disease. 

It will also be appreciated by those in the art that a transmembrane protein can be 
made soluble by removing transmembrane sequences, e.g., through recombinant methods. 
5 Furthermore, transmembrane proteins that have been made soluble can be made to be 
secreted through recombinant means by adding an appropriate signal sequence. 

In another embodiment, the prostate cancer proteins are secreted proteins; the 
secretion of which can be either constitutive or regulated. These proteins may have a signal 
peptide or signal sequence that targets the molecule to the secretory pathway. Secreted 

1 0 proteins are involved in numerous physiological events; by virtue of their circulating nature, 
they often serve to transmit signals to various other cell types. The secreted protein may 
fimction in an autocrine manner (acting on the cell that secreted the factor), a paracrine 
manner (acting on cells in close proximity to the cell that secreted the factor), an endocrine 
manner (acting on cells at a distance, e.g, secretion into the blood stream), or an exocrine 

15 manner (secretion, e.g., through a duct or to adjacent epithelial surface as sweat glands, 

sebaceous glands, pancreatic ducts, lacrimal glands, mammary glands, sax producing glands 
of the ear, etc.). Thus secreted molecules find use in modulating or altering numerous aspects 
of physiology. Prostate cancer proteins that are secreted proteins are particularly preferred in 
the present invention as they serve as good targets for diagnostic markers, e.g., for blood, 

20 plasma, serum, or stool tests. Those which are enzymes may be antibody or small molecule 
targets. Others may be useful as vaccine targets, e.g., via CTL mechanisms. 



Use of prostate cancer nucleic acids 

As described above, prostate cancer sequence is initially identified by substantial 
25 nucleic acid and/or amino acid sequence homology or linkage to the prostate cancer 

sequences outlined herein. Such homology can be based upon the overall nucleic acid or 
amino acid sequence, and is generally determined as outlined below, using either homology 
programs or hybridization conditions. Typically, linked sequences on a mRNA are found on 
the same molecule. 

30 The prostate cancer nucleic acid sequences of the invention, e.g., the sequences in 

Tables lA-4, can be fragments of larger genes, i.e., they are nucleic acid segments. "Genes" 
in this context includes coding regions, non-coding regions, and mixtures of coding and non- 
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coding regions. Accordingly, as will be appreciated by those in the art, using the sequences 
provided herein, extended sequences, in either direction, of the prostate cancer genes can be 
obtained, using techniques well known in the art for cloning either longer sequences or the 
ftill length sequences; see Ausubel, et al., supra. Much can be done by informatics and many 
5 sequences can be clustered to include multiple sequences corresponding to a smgle gene, e.g., 
systems such as UniGene (see, http://www.ncbi.nlm.nih.gov/UniGene/). 

Once the prostate cancer nucleic acid is identified, it can be cloned and, if necessary, 
its constituent parts recombined to form the entire prostate cancer nucleic acid coding regions 
or the entire mRNA sequence. Once isolated from its natural source, e.g., contained within a 
10 plasmid or other vector or excised therefrom as a linear nucleic acid segment, tlie 

recombinant prostate cancer nucleic acid can be further-used as a probe to identify and isolate 
other prostate cancer nucleic acids, e.g., extended coding regions. It can also be used as a 
"precursor" nucleic acid to make modified or variant prostate cancer nucleic acids and 
proteins. 

15 The prostate cancer nucleic acids of the present invention are used in several ways. In 

a first embodiment, nucleic acid probes to the prostate cancer nucleic acids are made and 
attached to biochips to be used in screening and diagnostic methods, as outlined below, or for 
administration, e.g., for gene therapy, vaccine, and/or antisense appHcations. Alternatively, 
the prostate cancer nucleic acids that include coding regions of prostate cancer proteins can 

20 be put into expression vectors for the expression of prostate cancer proteins, again for 
screening purposes or for administration to a patient. 

In a preferred embodiment, nucleic acid probes to prostate cancer nucleic acids (both 
the nucleic acid sequences outlined in the figures and/or the complements thereof) are made. 
The nucleic acid probes attached to the biochip are designed to be substantially 

25 complementary to the prostate cancer nucleic acids, i.e., the target sequence (either the target 
sequence of the sample or to other probe sequences, e.g., in sandwich assays), such that 
hybridization of the target sequence and the probes of the present invention occurs. As 
outlined below, this complementarity need not be perfect; there may be base pair mismatches 
which will interfere with hybridization between the target sequence and the single stranded 

30 nucleic acids of the present invention. However, if the number of mutations is so great that 
no hybridization can occur imder even the least stringent of hybridization conditions, the 
sequence is not a complementary target sequence. Thus, by "substantially complementary" 
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herein is meant that the probes are sufficiently complementary to the target sequences to 
hybridize under normal reaction conditions, particularly high stringency conditions, as 
outlined herein. 

A nucleic acid probe is generally single stranded but can be partially single and 
5 partially double stranded. The strandedness of the probe is dictated by the structure, 

composition, and properties of the target sequence, hi general, the nucleic acid probes range 
from about 8 to about 100 bases long, with from about 10 to about 80 bases being preferred, 
and from about 30 to about 50 bases being particularly preferred. That is, generally whole 
genes are not used, in some embodiments, much longer nucleic acids can be used, up to 

10 himdreds of bases. 

In a preferred embodiment, more than one probe per sequence is used, with either 
overlapping probes or probes to different sections of the target being used. That is, two, 
three, four or more probes, with three being preferred, are used to build in a redundancy for a 
particular target. The probes can be overlapping (i.e., have some sequence in common), or 

1 5 separate. In some cases, PGR primers may be used to amplify signal for higher sensitivity. 
As will be appreciated by those in the art, nucleic acids can be attached or 
immobilized to a solid support in a wide variety of ways. By "immobilized" and grammatical 
equivalents herein is meant the association or binding between the nucleic acid probe and the 
solid support is sufficient to be stable vmder the conditions of binding, washing, analysis, and 

20 removal as outlined below. The binding can typically be covalent or non-covalent. By "non- 
covalent binding" and grammatical equivalents herein is meant one or more of electrostatic, 
hydrophilic, and hydrophobic interactions. Included in non-covalent binding is the covalent 
attachment of a molecule, such as, streptavidin to the support and the non-covalent binding of 
the biotinylated probe to the streptavidin. By "covalent binding" and grammatical 

25 equivalents herein is meant that the two moieties, the solid support and the probe, are 
attached by at least one bond, including sigma bonds, pi bonds and coordination bonds. 
Covalent bonds can be formed directly between the probe and the solid support or can be 
formed by a cross linker or by inclusion of a specific reactive group on either the solid 
support or the probe or both molecules. Immobilization may also involve a combination of 

30 covalent and non-covalent interactions. 

In general, the probes are attached to the biochip in a wide variety of ways, as will be 
appreciated by those in the art. As described herein, the nucleic acids can either be 
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synthesized first, with subsequent attachment to the biochip, or can be directly synthesized on 
the biochip. 

The biocliip comprises a suitable solid substrate. By "substrate" or "solid support" or 
otiier grammatical equivalents herein is meant a material that can be modified to contain 
5 discrete individual sites appropriate for the attachment or association of the nucleic acid 
probes and is amenable to at least one detection method. As will be appreciated by those in 
the art, the number of possible substrates are very large, and include, but are not limited to, 
glass and modified or ftinctionalized glass, plastics (including acrylics, polystyrene and 
copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, 
10 poljmrethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or siUca- 
based materials including silicon and modified silicon, carbon, metals, inorganic glasses, 
plastics, etc. In general, the substrates allow optical detection and do not appreciably 
fluoresce. A preferred substrate is described in WO0055627, herein incorporated by 
reference in its entirety. 

1 5 Generally the substrate is planar, although as will be appreciated by those in the art, 

other configurations of substrates may be used as well. For example, the probes may be 
placed on the inside surface of a tube, for flow-through sample analysis to minimize sample 
volume. Similarly, the substrate may be flexible, such as a flexible foam, including closed 
cell foams made of particular plastics. 

20 In a preferred embodiment, the surface of the biochip and the probe may be 

derivatized with chemical functional groups for subsequent attachment of the two. Thus, e.g., 
the biochip is derivatized with a chemical functional group including, but not limited to, 
amino groups, carboxy groups, oxo groups and thiol groups, with amino groups being 
particularly preferred. Using these functional groups, the probes can be attached using 

25 functional groups on the probes. For example, nucleic acids containing amino groups can be 
attached to surfaces comprising amino groups, e.g., using Unkers as are known in the art; e.g., 
homo-or hetero-bifimctional Unkers as are well known (see 1994 Pierce Chemical Company 
catalog, technical section on cross-Knkers, pages 155-200). In addition, in some cases, 
additional linkers, such as alkyl groups (including substituted and heteroalkyl groups) may be 

30 used. 

In this embodiment, oligonucleotides are synthesized as is known in the art, and then 
attached to the surface of the soHd support. As will be appreciated by those skilled in the art, 
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either the 5' or 3 ' terminus may be attached to the solid support, or attachment may be via an 
internal nucleoside. 

In another embodiment, the immobilization to the solid support may be very strong, 
yet non-covalent. For example, biotinylated oligonucleotides can be made, which bind to 
5 surfaces covalently coated with streptavidin, resulting in attachment. 

Alternatively, the oligonucleotides may be synthesized on the surface, as is known in 
the art. For example, photoactivation techniques utilizing photopolymerization compounds 
and techniques are used. In a preferred embodiment, the nucleic acids can be synthesized in 
situ, using well known photolithographic techniques, such as those described in WO 

10 95/251 16; WO 95/35505; U.S. Patent Nos. 5,700,637 and 5,445,934; and references cited 
within, all of which are expressly incorporated by reference; these methods of attachment 
form the basis of the Affymetrix GeneChip™ technology. 

Often, amplification-based assays are performed to measure the expression level of 
prostate cancer-associated sequences. These assays are typically performed in conjunction 

1 5 with reverse transcription. In such assays, a prostate cancer-associated nucleic acid sequence 
acts as a template in an atiiplification reaction (e.g.. Polymerase Chain Reaction, or PCR). In 
a quantitative amplification, the amount of amplification product will be proportional to the 
amount of template in. the original sample. Comparison to appropriate controls provides a 
measure of the amount of prostate cancer-associated RNA. Methods of quantitative 

20 amplification are well known to those of skill in the art. Detailed protocols for quantitative 
PCR are provided, e.g., in Innis, et al. (1990) PCR Protocols: A Guide to Methods and 
A pplications Academic Press. 

In some embodiments, a TaqMan based assay is used to measure expression. 
TaqMan based assays use a fluorogenic oligonucleotide probe that contains a 5' fluorescent 

25 dye and a 3' quenching agent. The probe hybridizes to a PCR product, but cannot itself be 
extended due to a blocking agent at the 3 ' end. When the PCR product is amplified in 
subsequent cycles, the 5' nuclease activity of the polymerase, e.g., AmpliTaq, results in the 
cleavage of the TaqMan probe. This cleavage separates the 5' fluorescent dye and the 3' 
quenching agent, thereby resulting in an increase in fluorescence as a function of 

30 amplification (see, e.g., literature provided by Perkin-Elmer, e.g., www2.perkin-elmer.com). 
Other suitable amplification methods include, but are not limited to, ligase chain 
reaction (LCR) (see Wu and Wallace (1989) Genomics 4:560-569, Landegren, et al. (1988) 
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Science 241:1077-1080, and Barringer, et al. (1990) Gene 89:117-122), transcription 
amplification (Kwoh, et al. (1989) Proc. Natl. Acad. Sci. USA 86:1173-1177), self-sustained 
sequence replication (Guatelli, et al. (1990) Proc. Nat. Acad. Sci. USA 87:1874-1878), dot 
PGR, and linker adapter PGR, etc. 

5 

Expression of prostate cancer proteins from nucleic acids 

In a preferred embodiment, prostate cancer nucleic acids, e.g., encoding prostate 
cancer proteins are used to make a variety of expression vectors to express prostate cancer 
proteins which can then be used in screening assays, as described below. Expression vectors 

1 0 and recombinant DNA technology are well known to those of skill in the art (see, e.g., 

Ausubel, siipra, and Fernandez and Hoeffler (eds. 1999) Gene Expression Systems Academic 
Press) and are used to express proteins. The expression vectors may be either self-replicating 
extrachromosomal vectors or vectors which integrate into a host genome. Generally, these 
expression vectors include transcriptional and translational regulatory nucleic acid operably 

15 linlced to the nucleic acid encoding the prostate cancer protein. The term "control sequences" 
refers to DNA sequences used for the expression of an operably linked coding sequence in a 
particular host organism. Control sequences that are suitable for prokaryotes, e.g., include a 
promoter, optionally an operator sequence, and a ribosome binding site. Eukaryotic cells are 
known to utilize promoters, polyadenylation signals, and enhancers. 

20 Nucleic acid is "operably linked" when it is placed into a fimctional relationship with 

another nucleic acid sequence. For example, DNA for a presequence or secretory leader is 
operably linked to DNA for a polypeptide if it is expressed as a preprotein that participates in 
the secretion of the polypeptide; a promoter or enhancer is operably linked to a coding 
sequence if it affects the transcription of the sequence; a ribosome binding site is operably 

25 linked to a coding sequence if it is positioned so as to facilitate translation, and sequences 
may be operably linked when they are physically linked on the same molecule. Generally, 
"operably liiiked" means that the DNA sequences being linked are contiguous, and, in the 
case of a secretory leader, contiguous and in reading phase. However, enhancers do not have 
to be contiguous. Linking is typically accomplished by ligation at convenient restriction 

30 sites. If such sites do not exist, synthetic oligonucleotide adaptors or linkers are used in 

accordance with conventional practice. Transcriptional and translational regulatory nucleic 
acid will generally be appropriate to the host cell used to express the prostate cancer protein. 
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Numerous types of appropriate expression vectors, and suitable regulatory sequences are 
known in the art for a variety of host cells. 

In general, transcriptional and translational regulatory sequences may include, but are 
not limited to, promoter sequences, ribosomal binding sites, transcriptional start and stop 
5 sequences, translational start and stop sequences, and enhancer or activator sequences. In a 
preferred embodiment, the regulatory sequences include a promoter and transcriptional start 
and stop sequences. 

Promoter sequences encode either constitutive or inducible promoters. The promoters 
may be either naturally occurring promoters or hybrid promoters. Hybrid promoters, which 
10 combine elements of more than one promoter, are also known in the art, and are useful in the 
present invention. 

In addition, an expression vector may comprise additional elements. For example, the 
expression vector may have two replication systems, thus allowing it to be maintained in two 
organisms, e.g., in mammalian or insect cells for expression and in a prokaryotic host for 

15 cloning and amplification. Furthermore, for integrating expression vectors, the expression 
vector contains at least one sequence homologous to the host cell genome, and preferably two 
homologous sequences which flanlc the expression construct. The integrating vector may be 
directed to a specific locus in the host cell by selecting the appropriate homologous sequence 
for inclusion in the vector. Constructs for integrating vectors are well known in the art (e.g., 

20 Fernandez and Hoeffler, supra). 

In addition, in a preferred embodiment, the expression vector contains a selectable 
marker gene to allow the selection of transformed host cells. Selection genes are well known 
in the art and will vary with the host cell used. 

The prostate cancer proteins of the present invention are produced by culturing a host 

25 cell transformed with an expression vector containing nucleic acid encoding a prostate cancer 
protein, under the appropriate conditions to induce or cause expression of the prostate cancer 
protein. Conditions appropriate for prostate cancer protein expression will vary with the 
choice of the expression vector and the host cell, and will be easily ascertained by one skilled 
in the art through routine experimentation or optimization. For example, the use of 

30 constitutive promoters in the expression vector will require optimizing the growth and 

proliferation of the host cell, while the use of an inducible promoter requires the appropriate 
growth conditions for induction. In addition, in some embodiments, the timing of the harvest 
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is important. For example, the baculoviral systems used in insect cell expression are Ijrtic 
viruses, and thus harvest time selection can be crucial for product yield. 

Appropriate host cells include yeast, bacteria, archaebacteria, fimgi, and insect and 
animal cells, including mammahan cells. Of particular interest are Saccharomyces cerevisiae 
5 and other yeasts, E. coli, Bacillus subtilis, Sf9 cells, C129 cells, 293 cells, Neurospora, BHK, 
CHO, COS, HeLa cells, HUVEC (human umbilical vein endothehal cells), THPl cells (a 
macrophage cell line) and various other human cells and cell lines. 

In a preferred embodiment, the prostate cancer proteins are expressed in mammalian 
cells. Mammalian expression systems are also known in the art, and include retroviral and 

1 0 adenoviral systems. One expression vector system is a retroviral vector system such as is 
generally described in PCT/US97/01019 and PCr/US97/01048, both of which are hereby 
expressly incorporated by reference. Of particular use as mammalian promoters are the 
promoters ftom mammalian viral genes, since the viral genes are often highly expressed and 
have a broad host range. Examples include the SV40 early promoter, mouse mammary tumor 

15 virus LTR promoter, adenovirus major late promoter, heipes simplex vims promoter, and the 
CMV promoter (see, e.g., Fernandez and Hoeffler, supra). Typically, t-anscription 
termination and polyadenylation sequences recognized by mammalian cells are regulatory 
regions located 3 ' to the translation stop codon and thus, together with the promoter elements, 
flank the coding sequence. Examples of transcription terminator and polyadenylation signals 

20 include those derived form SV40. 

The methods of introducing exogenous nucleic acid into mammahan hosts, as well as 
other hosts, is well known in the art, and will vary with the host cell used. Techniques 
include dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated 
transfection, protoplast fusion, electroporation, viral infection, encapsulation of the 

25 polynucleotide(s) m liposomes, and direct microinjection of the DNA into nuclei. 

In a preferred embodiment, prostate cancer protems are expressed in bacterial 
systems. Bacterial expression systems are well known in the art. Promoters from 
bacteriophage may also be used and are known in the art. In addition, syntlietic promoters 
and hybrid promoters are also useful; e.g., the tac promoter is a hybrid of the trp and lac 

30 promoter sequences. Furthermore, a bacterial promoter can include naturally occurring 

promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and 
initiate transcription. In addition to a functioning promoter sequence, an efficient ribosome 
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binding site is desirable. The expression vector may also include a signal peptide sequence 
that provides for secretion of the prostate cancer protein in bacteria. The protein is either 
secreted into the growtli media (gram-positive bacteria) or into the periplasmic space, located 
between the inner and outer membrane of the cell (gram-negative bacteria). The bacterial 
5 expression vector may also include a selectable marker gene to allow for the selection of 
bacterial strains that have been transformed. Suitable selection genes include genes which 
render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, 
kanamycin, neomycin and tetracycline. Selectable markers also include biosynthetic genes, 
such as those in the histidine, tryptophan and leucine biosynthetic pathways. These 

1 0 components are assembled into expression vectors. Expression vectors for bacteria are well 
known in the art, and include vectors for Bacillus subtilis, E. coli, Streptococcus cremoris, 
and Streptococcus hvidans, among others (e.g., Fernandez and Hoeffler, supra). The 
bacterial expression vectors are transformed into bacterial host cells using techniques well 
known in the art, such as calcium chloride treatment, electroporation, and others. 

15 In one embodiment, prostate cancer proteins are produced in insect cells. Expression 

vectors for the transformation of insect cells, and in particular, baculovirus-based expression 
vectors, are well known in the art. 

hi a preferred embodiment, prostate cancer protein is produced in yeast cells. Yeast 
expression systems are well known in the art, and include expression vectors for 

20 Saccharomyces cerevisiae, Candida albicans and C. maltosa, Hansenula polymorpha, 
Kluyveromyces fragiUs and K. lactis, Pichia guillerimondii and P. pastoris, 
Schizosaccharomyces pombe, and Yarrowia lipolytica. 

The prostate cancer protein may also be made as a fusion protein, using techniques 
well known in the art. Thus, e.g., for the creation of monoclonal antibodies, if the desired 

25 epitope is small, the prostate cancer protein may be fused to a carrier protein to form an 
immunogen. Alternatively, the prostate cancer protein may be made as a fusion protein to 
increase expression, or for other reasons. For example, when the prostate cancer protein is a 
prostate cancer peptide, the nucleic acid encoding the peptide may be linked to other nucleic 
acid for expression purposes. 

30 In a preferred embodiment, the prostate cancer protein is purified or isolated after 

expression. Prostate cancer proteins may be isolated or purified in a variety of ways known 
to those skilled in the art depending on what other components are present in the sample. 
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Standard purification methods include electrophoretic, molecular, imraimological and 
chromatographic techniques, including ion exchange, hydrophobic, affinity, and reverse- 
phase HPLC chromatography, and chromatofocusing. For example, the prostate cancer 
protein may be purified using a standard anti-prostate cancer protein antibody column. 
5 Ultrafiltration and diafiltration techniques, in conjunction with protein concentration, are also 
useful. For general guidance in suitable purification techniques, see Scopes (1982) Protein 
Purification Springer-Verlag. The degree of purification necessary will vary depending on 
the use of the prostate cancer protein. In some instances no purification will be necessary. 
Once expressed and purified if necessary, the prostate cancer proteins and nucleic 
1 0 acids are useful in a number of applications. They may be used as inununoselection reagents, 
as vaccine reagents, as screening agents, etc. 

Variants of prostate cancer proteins 

In one embodiment, the prostate cancer proteins are derivative or variant prostate 
15 cancer proteins as compared to the' wild-type sequence. That is, as outlined more fiiUy below, 
the derivative prostate cancer peptide will often contain at least one amino acid substitution, 
deletion or insertion, with amino acid substitutions being particularly preferred. The amino 
acid substitution, insertion, or deletion may occur at most any residue within the prostate 
cancer peptide. 

20 Also included within one embodiment of prostate cancer proteins of the present 

invention are amino acid sequence variants. These variants typically fall into one or more of 
three classes: substitutional, insertional, or deletional variants. These variants ordinarily are 
prepared by site specific mutagenesis of nucleotides in the DNA encoding the prostate cancer 
protein, using cassette or PGR mutagenesis or other techniques well known in the art, to 

25 produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell 
culture as outlined above. However, variant prostate cancer protein firagments having up to 
about 100-150 residues may be prepared by in vitro synthesis using established techniques. 
Amino acid sequence variants are characterized by the predetermined nature of the variation, 
a feature that sets them apart fi'om naturally occurring allelic or interspecies variation of the 

30 prostate cancer protein amino acid sequence. The variants typically exhibit the same 

qualitative biological activity as the naturally occurring analogue, although variants can also 
be selected which have modified characteristics as will be more fiiUy outlined below. 
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While the site or region for introducing an amino acid sequence variation is 
predetermined, the mutation per se need not be predetermined. For example, in order to 
optimize the performance of a mutation at a given site, random mutagenesis may be 
conducted at tlie target codon or region and the expressed prostate cancer variants screened 
5 for the optimal combination of desired activity. Techniques for making substitution 

mutations at predetermined sites in DNA having a known sequence are well known, e.g., 
M13 primer mutagenesis and PGR mutagenesis. Screening of the mutants is done using 
assays of prostate cancer protein activities. 

Amino acid substitutions are typically of single residues; insertions usually will be on 
10 the order of from about 1 to 20 amino acids, although considerably larger insertions may be 
tolerated. Deletions range from about 1 to about 20 residues, although in some cases 
. deletions may be much larger. 

Substitutions, deletions, insertions or a combination thereof may be used to arrive at a 
final derivative. Generally these changes are done on a few amino acids to minimize the 
1 5 alteration of the molecule. However, larger changes may be tolerated in certain 

circumstances. When small alterations in the characteristics of the prostate cancer protein are 
desired, substitutions are generally made in accordance with the amino acid substitution 
relationships provided in the definition section. 

The variants typically exhibit the same qualitative biological activity and will eUcit 
20 the same immune response as the naturally-occurring analog, although variants also are 

selected to modify the characteristics of the prostate cancer proteins as needed. Alternatively, 
the variant may be designed such that the biological activity of the prostate cancer protein is 
altered. For example, glycosylation sites may be altered or removed. 

Substantial changes in function or immunological identity are made by selecting 
25 substitutions that are less conservative than those described above. For example, 

substitutions may be made which more significantly affect: the structure of the polypeptide 
backbone in the area of the alteration, for example the alpha-helical or beta-sheet structure; 
the charge or hydrophobicity of the molecule at the target site; or the bulk of the side chain. 
The substitutions which in general are expected to produce the greatest changes in the 
30 polypeptide's properties are tliose in which (a) a hydrophihc residue, e.g., serinyl or threonyl 
is substituted for (or by) a hydrophobic residue, e.g., leucyl, isoleucyl, phenylalanyl, valyl or 
alanyl; (b) a cysteine or proline is substituted for (or by) another residue; (c) a residue having 



wo 02/098358 



PCT/LS02/17594 



an electropositive side chain, e.g., lysyl, arginyl, or histidyl, is substituted for (or by) an 
electronegative residue, e.g., glutamyl or aspartyl; or (d) a residue having a bulky side chain, 
e.g., phenylalanine, is substituted for (or by) one not having a side chain, e.g., glycine. 

Covalent modifications of prostate cancer polypeptides are included within the scope 
5 of this invention. One type of covalent modification includes reacting targeted amino acid 
residues of a prostate cancer polypeptide with an organic derivatizing agent that is capable of 
reacting with selected side chains or the N-or C-terminal residues of a prostate cancer 
polypeptide. Derivatization with bifunctional agents is useful, for instance, for crosslinking 
prostate cancer polypeptides to a water-insoluble support matrix or surface for use in the 

10 method for purifying anti-prostate cancer polypeptide antibodies or screening assays, as is 
more fully described below. Commonlyused crosshnking agents include, e.g., 1,1- 
bis(diazoacetyl)-2-phenylethane, gjlutaraldehyde, N-hydroxysuccinimide esters, e.g., esters 
with 4-azidosalicylic acid, homobifunctiotial imidoesters, including disuccinimidyl esters 
such as 3,3'-dithiobis(succinimidylpropionate), bifunctional maleunides such as bis-N- 

15 maleimido-l,8-octane and agents such as methyl-3-((p-azidophenyl)dithio)propioimidate. 

Other modifications include deamidation of glutarainyl and asparaginyl residues to 
the corresponding glutamyl and aspartyl residues, respectively, hydroxylation of proline and 
lysine, phosphorylation of hydroxyl groups of serinyl, threonyl or tyrosyl residues, 
methylation of the amino groups of the lysine, arginine, and histidine side chains (e.g., pp. 

20 79-86, Creighton (1983) Proteins: Structure and Molecular Properties Freeman), acetylation 
of the N-terminal amme, and amidation of a C-terminal carboxyl group. 

Another type of covalent modification of the prostate cancer polypeptide included 
within the scope of this invention comprises altering the native glycosylation pattern of tlie 
polypeptide. "Altering the native glycosylation pattern" is intended for purposes herein to 

25 mean deleting one or more carbohydrate moieties found in native sequence prostate cancer 
polypeptide, and/or adding one or more glycosylation sites that are not present in the native 
sequence prostate cancer polypeptide. Glycosylation patterns can be altered in many ways. 
For example the use of different cell types to express prostate cancer-associated sequences 
can result in different glycosylation patterns. 

30 Addition of glycosylation sites to prostate cancer polypeptides may also be 

accomphshed by altering the amino acid sequence thereof. The alteration maybe made, e.g., 
by the addition of, or substitution by, one or more serine or threonine residues to the native 
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sequence prostate cancer polypeptide (for 0-linked glycosylation sites). The prostate cancer 
amino acid sequence may optionally be altered through changes at the DNA level, 
particularly by mutating the DNA encoding the prostate cancer polypeptide at preselected 
bases such that codons are generated that will translate into the desired amino acids. 
5 Another means of increasing the number of carbohydrate moieties on the prostate 

cancer polypeptide is by chemical or enzymatic coupling of glycosides to the polypeptide. 
Such methods are described in the art, e.g., in WO 87/05330, and pp. 259-306 in Aplin and 
Wriston (1981) CRC Crit. Rev. Biochem. 

Removal of carbohydrate moieties present on the prostate cancer polypeptide may be 

1 0 accompHshed chemically or enzymatically or by mutational substitution of codons encoding 
for amino acid residues that serve as targets for glycosylation. Chemical deglycosylation 
techniques are known in the art and described, e.g., by Hakimuddin, et al. (1987) Arch. 
Biochem. Biophvs. 259:52-57; and Edge, et al. (1981) Anal. Biochem. 118:131-137. 
Enzymatic cleavage of carbohydrate moieties on polypeptides can be achieved by the use of a 

1 5 variety of endo-and exo-glycosidases as described by Thotakura, et al. (1987) Meth. 
Enzvmol . 138:350-359. 

Another type of covalent modification of prostate cancer comprises Hnking the 
prostate cancer polypeptide to one of a variety of nonproteinaceous polymers, e.g., 
polyethylene glycol, polypropylene glycol, or polyoxyaUsylenes, in the manner set forth in 

20 U.S. Patent Nos. 4,640,835; 4,496,689; 4,301,144; 4,670,417; 4,791,192; or 4,179,337. 

Prostate cancer polypeptides of the present invention may also be modified in a way 
to form chimeric molecules comprising a prostate cancer polypeptide fused to another, 
heterologous polypeptide or amino acid sequence. In one embodiment, such a chimeric 
molecule comprises a fusion of a prostate cancer polypeptide with a tag polypeptide which 

25 provides an epitope to which an anti-tag antibody can selectively bind. The epitope tag is 
generally placed at the amino-or carboxyl-terminus of the prostate cancer polypeptide. The 
presence of such epitope-tagged forms of a prostate cancer polypeptide can be detected using 
an antibody against the tag polypeptide. Also, provision of the epitope tag enables the 
prostate cancer polypeptide to be readily purified by afSnity purification using an anti-tag 

30 antibody or another type of affinity matrix that binds to the epitope tag. In an alternative 
embodiment, the chimeric molecule may comprise a fiision of a prostate cancer polypeptide 
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with an immunoglobulin or a particular region of an immunoglobulin. For a bivalent form of 
the chimeric molecule, such a fusion covild be to the Fc region of an IgG molecule. 

Various tag polypeptides and their respective antibodies are well known in the art. 
Examples include poly-histidine (poly-liis) or poly-histidine-glycine (poly-his-gly) tags; HIS6 
5 and metal chelation tags, the flu HA tag polypeptide and its antibody 12CA5 (Field, et al. 
(1988) Mol. Cell. Biol . 8:2159-2165; the c-myc tag and the 8F9, 3C7, 6E10, G4, B7, and 
9E10 antibodies thereto (Evan, et al. (1985) Molecular and Cellular Biology 5:3610-3616); 
and the Herpes Simplex virus glycoprotein D (gD) tag and its antibody (Paborsky, et al. 
(1990) Protein Engineering 3:547-553). Other tag polypeptides include the Flag-peptide 

10 (Hopp, et al. (1988) BioTeclinolo^ 6:1204-1210); the KT3 epitope peptide (Martin, et al. 
(1992) Science 255:192-194); tubulin epitope peptide (Skinner, et al. (1991) J. Biol. Chem. 
266:15163-15166); and the T7 gene 10 protein peptide tag (Lutz-Freyermuth, et al. (1990) 
Proc. Natl. Acad. Sci. USA 87:6393-6397). 

Also included are other prostate cancer proteins of the prostate cancer family, and 

15 prostate cancer proteins from other organisms, which are cloned and expressed as outlined 
below. Thus, probe or degenerate polymerase chain reaction (PCR) primer sequences may be 
used to find other related prostate cancer proteins from humans or other organisms. As will 
be appreciated by those in the art, particularly useful probe and/or PCR primer sequences 
include the unique areas of the prostate cancer nucleic acid sequence. As is generally known 

20 in the art, preferred PCR primers are from about 15 to about 35 nucleotides in length, with 
from about 20 to about 30 being preferred, and may contain inosine as needed. The 
conditions for the PCR reaction are well known in the art (e.g., hinis, PCR Protocols , supra). 

Antibodies to prostate cancer proteins 

25 In a preferred embodiment, when the prostate cancer protein is to be used to generate 

antibodies, e.g., for immunotherapy or immunodiagnosis, the prostate cancer protein should 
share at least one epitope or determinant with the full length protein. By "epitope" or 
"determinant" herein is typically meant a portion of a protein which will generate and/or bmd 
an antibody or T-cell receptor in the context of MHC. Thus, in most instances, antibodies 

30 made to a smaller prostate cancer protein will be able to bind to the full-length protein, 
particularly linear epitopes. In a preferred embodiment, the epitope is imique; that is, 
antibodies generated to a unique epitope show little or no cross-reactivity. 
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Methods of preparing polyclonal antibodies are known to the skilled artisan (e.g., 
Coligan, supra; and Harlow and Lane, supra). Polyclonal antibodies can be raised in a 
mammal, e.g., by one or more injections of an immunizing agent and, if desired, an adjuvant. 
Typically, the immunizing agent and/or adjuvant will be injected in the mammal by multiple 
5 subcutaneous or intraperitoneal injections. The immunizing agent may include a protein 
encoded by a nucleic acid of the figures or fragment thereof or a fusion protein thereof It 
may be useful to conjugate the immunizing agent to a protein known to be immunogenic in 
the mammal being immunized. Examples of such immunogenic proteins include but are not 
limited to keyhole limpet hemocyanin, serum albumin, bovine thyroglobulin, and soybean 

1 0 trypsin inhibitor. Examples of adjuvants which may be employed include Freund' s complete 
adjuvant and MPL-TDM adjuvant (monophosphoryl Lipid A, synthetic trehalose 
dicorynomycolate). The immunization protocol may be selected by one skilled in the art 
without undue experimentation. 

The antibodies may, alternatively, be monoclonal antibodies. Monoclonal antibodies 

1 5 may be prepared using hybridoma methods, such as those described by Kohler and Milstein 
(1975) Nature 256:495. In a hybridoma method, a mouse, hamster, or other j^jpropriate host 
animal, is typically unmunized with an immvmizing agent to elicit lymphocytes that produce 
or are capable of producing antibodies that will specifically bind to the immunizing agent. 
Alternatively, the lymphocytes may be unmumzed in vitro. The immunizing agent will 

20 typically include a polypeptide encoded by a nucleic acid of Tables lA-4 or firagment thereof, 
or a fusion protein thereof Generally, either peripheral blood lymphocytes ("PBLs") are 
used if cells of human origin are desired, or spleen cells or lymph node cells are used if non- 
human mainmaUan sources are desired. The l3fltnphoc3rtes are then fused with an 
immortalized cell line using a suitable fusing agent, such as polyethylene glycol, to form a 

25 hybridoma cell (see pp. 59-103 in Coding (1986) Monoclonal Antibodies: Principles and 
Practice Academic Press). Immortalized cell lines are usually transformed mammalian cells, 
particularly myeloma cells of rodent, bovine and human origin. Usually, rat or mouse 
myeloma cell lines are employed. The hybridoma cells may be cultured in a suitable culture 
medium that preferably contains one or more substances that inhibit the growth or survival of 

30 the unfused, immortahzed cells. For example, if the parental cells lack the enzyme 

hypoxanthine guanine phosphoribosyl transferase (HGPRT or HPRT), the culture medium 
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for the hybridomas typically will include hypoxanthine, aminopterin, and thymidine ("HAT 
medium"), which substances prevent the growth of HGPRTTdeficient cells. 

In one embodiment, the antibodies are bispecific antibodies. Bispecific antibodies are 
monoclonal, preferably hvunan or humanized, antibodies that have binding specificities for at 
5 least two different antigens or that have binding specificities for two epitopes on the same 
antigen. In one embodiment, one of the binding specificities is for a protein encoded by a 
nucleic acid of Tables 1 A-4 or a fragment thereof, the other one is for another antigen, and 
preferably for a cell-surface protein or receptor or receptor subunit, preferably one that is 
tumor specific. Alternatively, tetramer-type technology may create multivalent reagents. 
10 ■ In a preferred embodiment, the antibodies to prostate cancer protein are capable of 
reducing or eliminating a biological function of a prostate cancer protein, as is described 
below. That is, the addition of anti-prostate cancer protein antibodies (either polyclonal or 
preferably monoclonal) to prostate cancer tissue (or cells containing prostate cancer) may 
reduce or eliminate the prostate cancer. Generally, at least a 25% decrease in activity, 
15 growth, size or the like is preferred, with at least about 50% being particularly preferred and 
about a 95-100% decrease being especially preferred. 

In a preferred embodiment the antibodies to the prostate cancer proteins are 
humanized antibodies (e.g., Xenerex Biosciences; Medarex, Inc.; Abgenix, Inc.; Protein 
Design Labs, Inc.). Hmnanized foims of non-human (e.g., murine) antibodies are cliimeric 
20 molecules of immunoglobulins, immunoglobulin chains or firagments thereof (such as Fv, 
Fab, Fab', F(ab')2 or other antigen-binding subsequences of antibodies) which contain 
minimal sequence derived firom non-human immunoglobulin, Hmnanized antibodies include 
human immunoglobulins (recipient antibody) in which residues from a complementary 
determining region (CDR) of the recipient are replaced by residues from a CDR of a non- 
25 human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, 
afBnity and capacity. In some instances, Fv framework residues of the human 
immunoglobulin are replaced by corresponding non-human residues. Humanized antibodies 
may also comprise residues which are found neither in the recipient antibody nor in the 
imported CDR or framework sequences. In general, a humanized antibody will comprise 
30 substantially all of at least one, and typically two, variable domains, in which all or 

substantially all of the CDR regions correspond to those of a non-human immunoglobulin 
and all or substantially all of the fi-amework (FR) regions are those of a human 
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iirrniunoglobulin consensus sequence. The humanized antibody optimally also will comprise 
at least a portion of an immunoglobulin constant region (Fc), typically that of a human 
immunoglobulin (Jones, et al. (1986) Nature 321:522-525; Riechmann, et al. (1988) Nature 
332:323-329; and Presta (1992) Curr. Op. Struct. Biol. 2:593-596). Humanization can be 
5 essentially performed following methods of Winter and co-workers (see, e.g., Jones, et al. 
(1986) Nature 321:522-525; Riechmann, et al. (1988) Nature 332:323-327; and Verhoeyen, et 
al. (1988) Science 239:1534-1536), by substituting rodent CDRs or CDR sequences for the 
corresponding sequences of a human antibody. Accordingly, such humanized antibodies are 
chimeric antibodies (U.S. Patent No. 4,816,567), wherein substantially less than an intact 
1 0 human variable domain has been substituted by the corresponding sequence firom a non- 
human species. 

Human antibodies can also be produced using various techniques known in the art, 
mcluding phage display libraries (Hoogenboom and Winter (1991) J. Mol. Biol . 227:381- 
388; Marks, et al. (1991) J. Mol. Biol . 222:581-597) or the preparation of human monoclonal 

15 Mitibodies (e.g., p77 in Cole, et al. (1985) Monoclonal Antibodies and Cancer Therapy Liss; 
and Boemer, et al. (1991) T. TmmuTinl. 147(l):86-95). Similarly, human antibodies can be 
made by introducing of human immunoglobulin loci into transgenic animals, e.g., mice in 
wliich tlie endogenous immunoglobulin genes have been partially or completely inactivated. 
Upon challenge, human antibody production is observed, which closely resembles that seen 

20 in humans in most respects, including gene rearrangement, assembly, and antibody repertoire. 
This approach is described, e.g., in U.S. Patent Nos. 5,545,807; 5,545,806; 5,569,825; 
5,625,126; 5,633,425; 5,661,016, and in the following scientific pubUcations: Marks, et al. 
(1992) Bio/Technology 10:779-783; Lonberg, et al. (1994) Nature 368:856-859; Morrison 
(1994) Nature 368:812-13; Fishwild, et al. (1996) Nature Biotechnology 14:845-51; 

25 Neuberger (1996) Nahire Biotechnology 14:826; Lonberg and Huszar (1995) Intern. Rev. 
Immunol. 13:65-93. 

By immunotherapy is meant treatment of prostate cancer with an antibody raised 
against prostate cancer proteins. As used herein, immunotherapy can be passive or active. 
Passive immunotherapy as defined herein is the passive transfer of Mitibody to a recipient 
30 (patient). Active immunization is the induction of antibody and/or T-cell responses in a 
recipient (patient). Induction of an immune response is the result of providing the recipient 
with an antigen to which antibodies are raised. As appreciated by one of ordinary skill in the 
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art, the antigen may be provided by injecting a polypeptide against which antibodies are 
desired to be raised into a recipient, or contacting the recipient with a nucleic acid capable of 
expressing the antigen and under conditions for expression of the antigen, leading to an 
immune response. 

5 In a preferred embodiment the prostate cancer proteins against which antibodies are 

raised are secreted proteins as described above. Without being bound by theory, antibodies 
used for treatment, bind and prevent the secreted protein from binding to its receptor, thereby 
inactivating the secreted prostate cancer protein. 

In another preferred embodiment, the prostate cancer protein to which antibodies are 

1 0 raised is a transmembrane protein. Without being bound by theory, antibodies used for 
treatment bind the extracellular domain of the prostate cancer protein and prevent it from 
binding to other proteins, such as circulating Ugands or cell-associated molecules. The 
antibody may cause down-regulation of the transmembrane prostate cancer protein. As will 
be appreciated by one of ordinary skill in the art, the antibody may be a competitive, non- 

1 5 competitive or uncompetitive inhibitor of protein binding to the extracellular domain of the 
prostate cancer protein. The antibody is also often an antagonist of the prostate cancer 
protein. Further, the antibody may prevent activation of the transmembrane prostate cancer 
protein. In one aspect, when the antibody prevents tlie binding of other molecules to the 
prostate cancer protein, the antibody prevents growth of the cell. The antibody may also be 

20 used to target or sensitize the cell to cytotoxic agents, including, but not limited to TNF-a, 
TNF-P, II^l, INF-y, and IL-2, or chemotherapeutic agents uicluding 5FU, vinblastine, 
actinomycin D, cisplatin, methotrexate, and the like. In some instances the antibody belongs 
to a sub-type that activates serum complement when complexed with tlie transmembrane 
protein thereby mediating cytotoxicity or antigen-dependent cytotoxicity (ADCC). Thus, 

25 prostate cancer is treated by administering to a patient antibodies directed against tlie 

transmembrane prostate cancer protein. Antibody-labeUng may activate a co-toxin, localize a 
toxin payload, or otherwise provide means to locally ablate cells. 

In another preferred embodiment, the antibody is conjugated to an effector moiety. 
The effector moiety can be a labeling moiety such as a radioactive label or fluorescent label, 

30 or can be a therapeutic moiety. In one aspect the therapeutic moiety is a small molecule that 
modulates the activity of the prostate cancer protein. In another aspect the therE^eutic moiety 
modulates the activity of molecules associated with or in close proximity to the prostate 
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cancer protein. The therapeutic moiety may inhibit enzymatic activity such as protease or 
coUagenase or protein kinase activity associated with prostate cancer. 

In a preferred embodiment, the therapeutic moiety can also be a cytotoxic agent. In 
this method, targeting the c)4:otoxic agent to prostate cancer tissue or cells, results in a 
5 reduction in the number of afflicted cells, thereby reducing symptoms associated with 

prostate cancer. Cytotoxic agents are nmnerous and varied and include, but are not limited 
to, c3^otoxic drugs or toxins or active fragments of such toxins. Suitable toxins and their 
corresponding firagments include diphtheria A chain, exotoxin A chain, ricin A chain, abrin A 
chain, curcin, crotin, phenomycin, enomycin, saporio, auristatin, and the like. Cytotoxic 

1 0 agents also include radiochemicals made by conjugating radioisotopes to antibodies raised 
against prostate cancer proteins, or binding of a radionuclide to a chelating agent that has 
been covalently attached to the antibody. Targeting the therapeutic moiety to transmembrane 
prostate cancer proteins not only serves to increase the local concentration of therapeutic 
moiety in the prostate cancer afflicted area, but also serves to reduce deleterious side effects, 

15 e.g., by binding to normal tissues, that may be associated with the therapeutic moiety. 

In another preferred embodiment, the prostate cancer protein against which the 
antibodies are raised is an intracellular protein. In this case, the antibody may be conjugated 
to a protein which facilitates entry into the cell. In one case, the antibody enters the cell by 
endocj^osis. In another embodiment, a nucleic acid encoding the antibody is administered to 

20 the individual or cell. Moreover, wherein the prostate cancer protein can be targeted witliin a 
ceU, i.e., the nucleus, an antibody thereto contains a signal for that target localization, i.e., a 
nuclear localization signal. 

The prostate cancer antibodies of the invention specifically bind to prostate cancer 
proteins. By "specifically bind" herein is meant that the antibodies bind to the protein with a 

25 Kd of at least about 0. 1 mM, more usually at least about 1 pM, preferably at least about 0, 1 
pM or better, and most preferably, 0.01 jjM or better. Selectivity of binding is also 
important. 

Detection of prostate cancer sequence for diagnostic and therapeutic applications 

30 In one aspect, the RNA expression levels of genes are determined for different 

cellular states in the prostate cancer phenotype. After androgen ablation therapy, cells that 
survive the therapy undergo a period of quiescence followed at sometune later by active cell 
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division. As explained above, there are a variety of expression patterns characteristic of the 
prostate cancer genes involved in androgen-independent prostate cancer. Some genes are 
expressed early in the time course following ablation therapy, then drop off in expression, 
and then express again with emergence of androgen-independence (hi-lo-hi pattern in 1 A). 
5 Other genes are expressed early in the time course following ablation therapy, then drop off 
in expression, and do not express again with emergence of androgen-independence (hi-lo-lo 
pattern in Table 1 A). Still other genes are not expressed early in the time course, but express 
only with emergence of androgen-independence (lo-lo-hi pattern in Table 1 A). Other genes 
are not expressed early in the time course, but then express as androgen is withdrawn and 

1 0 continue to express with emergence of androgen-independence (lo-hi-Iii pattern in Table 1 A). 
Finally, some genes are not expressed early in the time course, but then express as androgen 
is withdrawn and drop off again with emergence of androgen-independence (lo-hi-lo pattern 
in Table 1 A). Thus, the data suggest that different antigens are expressed in quiescent cells 
and actively dividing androgen-independent prostate cancer cells. 

15 Li another aspect, the RNA expression levels of genes are determined for different 

cellular states in the prostate cancer phenotype. After androgen ablation therapy, cells that 
survive the therapy undergo a period of quiescence followed at sometime later by active cell 
division. As explained above, there are a variety of expression patterns characteristic of the 
prostate cancer genes involved in androgen-independent prostate cancer. Some genes are 

20 expressed early in the time course following ablation therapy, then drop off in expression, 
and then express again with emergence of androgen-independence (hi-lo-lo-hi pattern in 
Table 2A). Other genes are expressed early in the time course following ablation therapy, 
then drop off in expression, and do not express again with emergence of androgen- 
independence (hi-lo-lo-lo and hi-hi-lo-lo pattern in Table 2A). Still other genes are not 

25 expressed early in the time course, but express only with emergence of androgen- 
independence (lo-lo-lo-hi pattern in Table 2A). Other genes are not e3q)ressed early in the 
time course, but then express as androgen is withdrawn and continue to express with 
emergence of androgen-independence (lo-lo-hi-hi pattern in Table 2A). Finally, some genes 
are not expressed early in the time course, but then express as androgen is withdrawn and 

30 drop off again with emergence of androgen-independence (lo-lo-hi-lo pattern in Table 2A). 
Thus, the data suggest that different antigens are expressed in quiescent cells (during 
androgen withdrawal) and actively dividing androgen-iadependent prostate cancer cells. 
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Effective therapy to combat androgen-independetit prostate cancer requires that the 
timing of therapy coincide with expression of the target genes. Patients can be monitored for 
the expression of certain diagnostic antigens that indicate the presence of quiescent cells or 
which indicate the transition to actively dividing androgen-independent prostate cancer cells. 
5 Thus, therapy to combat androgen-independent prostate cancer should begin at some time 
following androgen ablation therapy, depending on the particular target. Typically the 
transition from quiescence to actively dividing androgen-independent prostate cancer occurs 
between 6-24 months following androgen ablation therapy. Thus, preferred time periods for 
the therapies of the invention are as follows; 

1 0 Expression levels of genes in normal tissue (i.e., not undergoing prostate cancer) and 

in prostate cancer tissue (and in some cases, for varying severities of prostate cancer that 
relate to prognosis, as outlined below) or in non-malignant disease are evaluated to provide 
expression profiles. An expression profile of a particular cell state or point of development is 
essentially a "fingerprint" of the state. While two states may have a particular gene similarly 

1 5 expressed, the evalimtion of a number of genes simultaneously allows the generation of a 
gene expression profile that is reflective of the state of the cell. By comparing expression 
profiles of cells in different states, information regarding which genes are important 
(including both up- and down-regulation of genes) in each of these states is obtained. Then, 
diagnosis may be perforated or confinned to determine whether a tissue sample has the gene 

20 expression profile of normal or cancerous tissue. This will provide for molecular diagnosis 
of related conditions. 

"Differential expression," or grammatical equivalents as used herein, refers to 
qualitative or quantitative differences in the temporal and/or cellular gene expression patterns 
within and among cells and tissue. Thus, a differentially expressed gene can qualitatively 

25 have its expression altered, including an activation or inactivation, in, e.g., normal versus 
prostate cancer tissue. Genes may be turned on or turned off in a particular state, relative to 
another state thus permitting comparison of two or more states. A qualitatively regulated 
gene will exhibit an expression pattern within a state or cell type which is detectable by 
standard techniques. Some genes will be expressed in one state or cell type, but not in both. 

30 Alternatively, the difference in expression may be quantitative, e.g., in that expression is 
increased or decreased; i.e., gene expression is either upregulated, resulting in an increased 
amount of transcript, or downregulated, resulting in a decreased amount of transcript. The 
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degree to which expression differs need only be large enough to quantify via standard 
characterization techniques as outlined below, such as by use of Affymetrix GeneChip™ 
expression arrays, Lockhart (1996) Nature Biotechnology 14: 1675-1680, hereby expressly 
incorporated by reference. Other techniques include, but are not limited to, quantitative 
5 reverse transcriptase PGR, northern analysis and KNase protection. As outlined above, 
preferably the change in expression (i.e., upregulation or downregulation) is at least about 
50%, more preferably at least about 100%, more preferably at least about 150%, more 
preferably at least about 200%, with from 300 to at least 1000% being especially preferred. 
Evaluation may be at the gene transcript, or the protein level. The amount of gene 

10 expression may be monitored using nucleic acid probes to the DNA or RNA equivalent of the 
gene transcript, and the quantification of gene expression levels, or, alternatively, the final 
gene product itself (protein) can be monitored, e.g., with antibodies to the prostate cancer 
protein and standard immunoassays (ELISAs, etc.) or other techniques, including mass 
spectroscopy assays, 2D gel electrophoresis assays, etc. Proteins corresponding to prostate 

1 5 cancer genes, i.e., those identified as being important in a prostate cancer or disease 
phenotype, can be evaluated in a prostate cancer diagnostic test. 

In a preferred embodiment, gene expression monitoring is performed simultaneously 
on a number of genes. Multiple protein expression monitoring can be performed as well. 
Similarly, these assays may be performed on an individual basis as well. 

20 In this embodiment, the prostate cancer nucleic acid probes are attached to biochips as 

outUned herein for the detection and quantification of prostate cancer sequences in a 
particular cell. The assays are further described below in the example. PGR techniques can 
be used to provide greater sensitivity. 

In a preferred embodunent nucleic acids encoding the prostate cancer protein are 

25 detected. Although DNA or RNA encoding the prostate cancer protein may be detected, of 
particular interest are methods wherein an mRNA encoding a prostate cancer protein is 
detected. Probes to detect mRNA can be a nucleotide/deoxynucleotide probe that is 
complementary to and hybridizes with the mRNA and includes, but is not limited to, 
oUgonucleotides, cDNA, or RNA. Probes also should contain a detectable label, as defined 

30 herein. In one method the mRNA is detected after immobilizing the nucleic acid to be 
examined on a solid support such as nylon membranes and hybridizing the probe with the 
sample. Following washing to remove the non-specifically bound probe, the label is 
58 
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detected. In another method detection of the mRNA is performed in situ (in situ 
hybridization or ISH). In this method permeabilized cells or tissue samples are contacted 
with a detectably labeled nucleic acid probe for sufficient time to allow the probe to hybridize 
with the target mRNA. Following washing to remove the non-specifically bound probe, the 
5 label is detected. For example a digoxygenin labeled riboprobe (RNA probe) that is 

complementary to the mRNA encoding a prostate cancer protein is detected by binding the 
digoxygenin with an anti-digoxygenin secondary antibody and developed with nitro blue 
tetrazolium and 5-bromo-4-chIoro-3-indoyl phosphate. 

In a preferred embodiment, various proteins from the three classes of proteins as 

10 described herein (secreted, transmembrane, or intracellular proteins) are used in diagnostic . 
assays. The prostate cancer proteins, antibodies, nucleic acids, modified proteins and cells 
containing prostate cancer sequences are used in diagnostic assays. Such may evaluate 
tissues, e.g., immunohistochemistry, or evaluate body fluids, e.g., blood. The detection may 
be direct of cells, or indirect, e.g., of products from cells. This can be performed on an 

15 individual gene or corresponding polypeptide level. In a preferred embodiment, the 
expression profiles are used, preferably in conjunction with high throughput screening 
techniques to allow monitoring for expression profile genes and/or corresponding 
polypeptides. 

As described and defined herein, prostate cancer proteins, including intracelMar, 
20 transmembrane, or secreted proteins, find use as prognostic or diagnostic markers of prostate 
cancer or other prostate conditions. Detection of these proteins in putative prostate cancer 
tissue allows for detection, diagnosis, or prognosis of prostate proUferative disorders 
(mahgnant and non-malignant) including benign prostate hyperplasia (BPH) and cancer, and 
prostatitis. Diagnosis may also assist in selecting a therapeutic strategy, e.g., based on 
25 expression profiles and/or comparison to archival samples. In one embodiment, antibodies 
are used to detect prostate cancer proteins, directiy or indirectly. A preferred method 
separates proteins from a sample by electrophoresis on a gel (typically a denaturing and 
reducing protein gel, but may be another type of gel, including isoelectric focusing gels and 
the like). Following separation of proteins, the prostate cancer protein is detected, e.g., by 
30 immunoblotting with antibodies raised against the prostate cancer protein. Methods of 
immunoblotting are well known to those of ordinary skill in the art. 
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In another preferred method, antibodies to the prostate cancer protein find use in in 
situ imaging techniques, e.g., in histology and/or in immunohistochemistry (e.g., Asai (ed. 
1993) Methods in Cell Biology: Antibodies in Cell Biology (vol. 37) Academic Press. In this 
method cells are contacted with from one to many antibodies to the prostate cancer protein(s). 
5 Following washing to remove non-specific antibody binding, the presence of the antibody or 
antibodies is detected. In one embodiment the antibody is detected by incubating with a 
secondary antibody that contains a detectable label. In another method the primary antibody 
to the prostate cancer protein(s) contains a detectable label, e.g., an enzyme marker that can 
act on a substrate. In another preferred embodiment each one of multiple primary antibodies 
1 0 contains a distinct and detectable label. This method finds particular use in simultaneous 
screening for a plurality of prostate cancer proteins. As will be appreciated by one of 
ordinary skill in the art, many other histological imaging techniques are also provided by the 
invention. 

In a preferred embodiment the label is detected in a fluorometer which has the ability 

15 to detect and distinguish emissions of different wavelengths. In addition, a fluorescence 
activated cell sorter (FACS) can be used in the mefliod. 

In another preferred embodiment, antibodies find use in diagnosing prostate cancer 
from blood, serum, plasma, stool, and other samples. Such samples, therefore, are usefiil as 
samples to be probed or tested for the presence of prostate cancer proteins, which may be 

20 diagnostic of prostate conditions beyond cancer, e.g., BPH. Antibodies can be used to detect 
a prostate cancer protein by previously described immunoassay techniques including ELISA, 
immunoblotting (western blotting), immunoprecipitation, BIACORE technology, and the 
like. Conversely, the presence of antibodies may indicate an immune response against an 
endogenous prostate cancer protein. 

25 In a preferred embodiment, in situ hybridization of labeled prostate cancer nucleic 

acid probes to tissue arrays is done. For example, arrays of tissue samples, including prostate 
cancer tissue and/or normal tissue, are made. In situ hybridization (see, e.g., Ausubel, supra) 
is then performed. When comparing the fingerprints between an individual and a standard, 
the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the findings. It 

30 is further understood that the genes which indicate the diagnosis may differ from those which 
indicate the prognosis and molecular profiling of the condition of the cells may lead to 
distinctions between responsive or refractory conditions or may be predictive of outcomes. 
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In a preferred embodiment, the prostate cancer proteins, antibodies, nucleic acids, 
modified proteins, and cells containing prostate cancer sequences are used in prognosis 
assays. As above, gene expression profiles can be generated that correlate to prostate cancer 
or other prostate disorders, in terms of usefiil aspects of clinical condition, pathology, or other 
5 information which may be relevant to long term prognosis. Again, this may be done on either 
a protein or gene level, with the use of genes being preferred. Single or multiple genes may 
be useful in various combinations. As above, prostate cancer probes may be attached to 
biochips for the detection and quantification of prostate cancer sequences in a tissue or 
patient. The assays proceed as outlined above for diagnosis. PGR method may provide more 
10 sensitive and acciirate quantification. 

Assays for therapeutic compounds 

In a preferred embodiment members of the proteins, nucleic acids, and antibodies as 
described herein are used in drug screening assays. The prostate cancer proteins, antibodies, 

15 nucleic acids, modified proteins, and cells containing prostate cancer sequences are used in 
drug screening assays or by evaluating the effect of drug candidates on a "gene expression 
profile" or expression profile of polypeptides. In a preferred embodiment, the expression 
profiles are used, preferably in conjunction with high throughput screening teclmiques to 
allow monitoring for expression profile genes after treatment with a candidate agent (e.g., 

20 Zlokamik, et al. (1998) Science 279:84-88; Heid (1996) Genome Res. 6:986-94). 

In a preferred embodiment, the prostate cancer proteins, antibodies, nucleic acids, 
modified proteins, and ceUs containing the native or modified prostate cancer proteins are 
used in screening assays. That is, the present invention provides novel methods for screening 
for compositions which modulate the prostate cancer phenotype or an identified physiological 

25 fimction of a prostate cancer protein. As above, this can be done on an individual gene level 
or by evaluating the effect of drug candidates on a "gene expression profile". In a preferred 
embodiment, the expression profiles are used, preferably in conjunction with high teoughput 
screening techniques to allow monitoring for expression profile genes after treatment with a 
candidate agent, see Zlokamik, supra. 

30 Having identified the differentially expressed genes herein, a variety of assays may be 

executed. In a preferred embodiment, assays may be run on an individual gene or protein 
level. That is, having identified a particular gene as up regulated in prostate cancer, test 
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compounds caa be screened for the ability to modulate gene expression or for binding to the 
prostate cancer protein. "Modulation" thus includes both an increase and a decrease in gene 
expression. The preferred amount of modulation will depend on the original change of the 
gene expression in normal versus tissue undergoing prostate cancer, with changes of at least 
5 10%, preferably 50%, more preferably 100-300%, and in some embodiments 300-1000% or 
greater. Thus, if a gene exhibits a 4-fold increase in prostate cancer tissue compared to 
normal tissue, a decrease of about four- fold is often desired; similarly, a 10-fold decrease in 
prostate cancer tissue compared to normal tissue often provides a target value of a 10-fold 
increase in expression to be induced by the test compound. 
1 0 The amount of gene expression may be monitored using nucleic acid probes and the 

quantification of gene expression levels, or, alternatively, the gene product itself can be 
monitored, e.g., through the use of antibodies to the prostate cancer protein and standard 
immunoassays. Proteomics and separation techniques may also allow quantification of 
expression. 

15 In a preferred embodiment, gene expression or protein monitoring of a number of 

entities, i.e., an expression profile, is monitored simultaneously. Such profiles will typically 

involve a plurality of those entities described herein. 

In this embodiment, the prostate cancer nucleic acid probes are attached to biochips as 

outUned herein for the detection and quantification of prostate cancer sequences in a 
20 particular cell. Alternatively, PGR may be used. Thus, a series, e.g., of microtiter plate, may 

be used with dispensed primers in desired weUs. A PGR reaction can then be performed and 

analyzed for each well. 

Expression monitoring can be performed to identify compounds that modify the 

expression of one or more prostate cancer-associated sequences, e.g., a polynucleotide 
25 sequence set out in Tables 1 A-4. Generally, in a preferred embodiment, a test modulator is 

added to the cells prior to analysis. Moreover, screens are also provided to identify agents 

that modulate prostate cancer, modulate prostate cancer proteins, bind to a prostate cancer 

protein, or interfere with the binding of a prostate cancer protein and an antibody or other 

binding partner. 

30 The term "test compound" or "drug candidate" or "modulator" or grammatical 

equivalents as used herein describes a molecule, e.g., protein, oligopeptide, small organic 
molecule, polysaccharide, polynucleotide, etc., to be tested for the capacity to directly or 
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indirectly alter the prostate cancer phenotype or the expression of a prostate cancer sequence, 
e.g., a nucleic acid or protein sequence, hi preferred embodiments, modulators alter 
expression profiles, or expression profile nucleic acids or proteins provided herein. In one 
embodiment, the modulator suppresses a prostate cancer phenotype, e.g., to a normal or non- 
5 malignant tissue fingerprint. In another embodiment, a modulator induced a prostate cancer 
phenotype. Generally, a plurahty of assay mixtures are run in parallel with different agent 
concentrations to obtain a differential response to the various concentrations. Typically, one 
of these concentrations serves as a negative control, i.e., at zero concentration or below the 
level of detection. 

10 Drug candidates encompass numerous chemical classes, though typically they are 

organic molecules, preferably small organic compounds having a molecular weight of more 
than 100 and less than about 2,500 daltons. Preferred small molecules are less than 2000, or 
less than 1500, or less than 1000, or less than 500 D. Candidate agents comprise functional 
groups necessary for structural interaction with proteins, particularly hydrogen bonding, and 

15 typically include at least an amine, carbonyl, hydroxy! or carboxyl group, preferably at least 
two of the fimctional chemical groups. The candidate agents often comprise cyclical carbon 
or heterocyclic structures and/or aromatic or polyaromatic structures substituted with one or 
more of the above functional groups. Candidate agents are also foimd among biomolecules 
including peptides, saccharides, fatty acids, steroids, purines, pyrimidines, derivatives, 

20 structural analogs, or combinations thereof Particularly preferred are peptides. 

In one aspect, a modulator will neutralize the effect of a prostate cancer protein. By 
"neutralize" is meant that activity of a protein is inhibited or blocked and the consequent 
effect on the cell. 

In certain embodiments, combinatorial libraries of potential modulators will be 
25 screened for an abiUty to bind to a prostate cancer polypeptide or to modulate activity. 

Conventionally, new chemical entities with useful properties are generated by identifying a 
chemical compound (called a "lead compound") with some desirable property or activity, 
e.g., inhibiting activity, creating variants of the lead compound, and evaluating the property 
and activity of those variant compounds. Often, high throughput screening (HTS) methods 
30 are employed for such an analysis. 

In one preferred embodiment, high throughput screening methods involve providing a 
library containing a large number of potential therapeutic compounds (candidate 
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compounds). Such "combinatorial chemical libraries" are then screened in one or more 
assays to identify those library members (particular chemical species or subclasses) that 
display a desired characteristic activity. The compounds thus identified can serve as 
conventional "lead compounds" or can themselves be used as potential or actual therapeutics. 
5 A combinatorial chemical library is a collection of diverse chemical compounds 

generated by either chemical synthesis or biological synthesis by combining a number of 
chemical "building blocks" such as reagents. For example, a linear combinatorial chemical 
library, such as a polypeptide (e.g., mutein) library, is formed by combining a set of chemical 
building blocks called amino acids in most every possible way for a given compound length 

1 0 (i.e., the number of amino acids in a polypeptide compound). Millions of chemical 

compounds can be synthesized through such combinatorial mixing of chemical building 
blocks. GaUop, et al. (1994) J. Med. Chem. 37:1233-1251. 

Preparation and screening of combinatorial chemical libraries is well known to those 
of skill in the art. Such combinatorial chemical libraries include, but are not limited to, 

15 peptide libraries (see, e.g., U.S. Patent No. 5,010,175, Furka (1991) Pept. Prot. Res. 37:487- 
493, Houghton, et al. (1991) Nature. 354:84-88), peptoids (PCX Publication No WO 
91/19735), encoded peptides (PCX Publication WO 93/20242), random bio-oligomers (PCX 
PubHcation WO 92/00091), benzodiazepines (U.S. Pat. No. 5,288,514), diversomers such as 
hydantoins, benzodiazepines and dipeptides (Hobbs, et al. (1993) Proc. Nat. Acad. Sci. USA 

20 90:6909-6913), vinylogous polypeptides (Hagihara, et al. (1992) J. Amer. Chem. Soc. 
1 14:6568-xxx), nonpeptidal peptidomimetics with a Beta-D-Glucose scaffolding 
(Hirschmann, et al. (1992) J. Amer. Chem. Soc. 1 14:9217-9218), analogous organic 
syntlieses of small compound libraries (Chen, et al. (1994) J. Amer. Chem. Soc. 116:2661- 
xxx), oligocarbamates (Cho, et al. (1993) Science 261:1303-1305), and/or peptidyl 

25 phosphonates (Campbell, et al. (1994) J. Org. Chem. 59:658-xxx). See, generally, Gordon, et 
al. (1994) J. Med. Chem. 37:1385-1401), nucleic acid libraries (see, e.g., Stratagene, Corp.), 
peptide nucleic acid libraries (see, e.g., U.S. Patent 5,539,083), antibody libraries (see, e.g., 
Vaughn, et al. (1996) Nature Biotechnology 14:309-314, and PCX/US96/10287), 
carbohydrate libraries (see, e.g., Liang, et al. (1996) Science 274:1520-1522, and U.S. Patent 

30 No. 5,593,853), and small organic molecule libraries (see, e.g., benzodiazepines, Baum 

(1993) C&EN. Jan 18, page 33; isoprenoids, U.S. Patent No. 5,569,588; thiazolidinones and 
metathiazanones, U.S. Patent No. 5,549,974; pyrrolidines, U.S. Patent Nos. 5,525,735 and 
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5,519,134; morpholino compounds, U.S. Patent No. 5,506,337; benzodiazepines, U.S. Patent 
No. 5,288,514; and the like). 

Devices for the preparation of combinatorial libraries are commercially available (see, 
e.g., 357 MPS, 390 MPS, Advanced Chem Tech, Louisville KY, Symphony, Rainin, 
5 Wobum, MA, 433A Applied Biosystems, Foster City, CA, 9050 Plus, Millipore, Bedford, 
MA). 

A number of well known robotic systems have also been developed for solution phase 
chemistries. These systems include automated workstations like the automated synthesis 
apparatus developed by Takeda Chemical Industries, LTD. (Osaka, Japan) and many robotic 

10 systems utilizing robotic arms (Zymate II, Zymark Corporation, Hopkinton, Mass.; Orca, 
Hewlett-Packard, Palo Alto, Calif.), which mimic the manual synthetic operations performed 
by a chemist. Many of the above devices are suitable for use with the present invention. The 
nature and implementation of modifications to these devices (if any) so that they can operate 
as discussed herein will be apparent to persons skilled in the relevant art. In addition, 

15 numerous combinatorial libraries are themselves commercially available (see, e.g., 

ComGenex, Princeton, N.J., Asinex, Moscow, Ru, Tripos, Inc., St. Louis, MO, ChemStar, 
Ltd, Moscow, RU, 3D Pharmaceuticals, Exton, PA, Martek Biosciences, Columbia, MD, 
etc.). 

The assays to identify modulators are amenable to higli throughput screening. 
20 Preferred assays thus detect enhancement or inhibition of prostate cancer gene transcription, 
inhibition or enhancement of polypeptide expression, and inhibition or enhancement of 
polypeptide activity. 

High throughput assays for the presence, absence, quantification, or other properties 
of particular nucleic acids or protein products are well known to those of skill in the art. 

25 Similarly, binding assays and reporter gene assays are similarly well known. Thus, e.g., U.S. 
Patent No. 5,559,410 discloses high throughput screening methods for proteins, U.S. Patent 
No. 5,585,639 discloses high throughput screening methods for nucleic acid binding (i.e., in 
arrays), while U.S. Patent Nos. 5,576,220 and 5,541,061 disclose high througlput methods of 
screening for Ugand/antibody binding. 

30 In addition, high throughput screening systems are commercially available (see, e.g., 

Zymark Corp., Hopkinton, MA; Air Technical Industries, Mentor, OH; Beckman 
Instruments, Inc. FuUerton, CA; Precision Systems, Inc., Natick, MA, etc.). These systems 
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typically automate entire procedures, including sample and reagent pipetting, liquid 
dispensing, timed incubations, and final readings of the microplate in detector(s) appropriate 
for tiie assay. These configurable systems provide higli throughput and rapid start up as well 
as a high degree of flexibility and customization. The manufacturers of such systems provide 
5 detailed protocols for various high throughput systems. Thus, e.g., Zymark Corp. provides 
technical bulletins describing screening systems for detecting the modulation of gene 
transcription, ligand binding, and the like. 

In one embodiment, modulators are protems, often naturally occurring proteins or 
fragments of naturally occurring proteins. Thus, e.g., cellular extracts containing proteins, or 

10 random or directed digests of proteinaceous cellular extracts, may be used. In this way 

libraries of proteins may be made for screening in the methods of the invention. Particularly 
preferred in this embodiment are libraries of bacterial, fimgal, viral, and mammalian proteins, 
with the latter being preferred, and human proteins being especially preferred. Particularly 
useful test compound will be directed to the class of proteins to which the target belongs, e.g., 

1 5 substrates for enzymes or ligands and receptors. 

In a preferred embodiment, modulators are peptides of from about 5 to about 30 
amino acids, with from about 5 to about 20 amino acids being preferred, and from about 7 to 
about 15 being particularly preferred. The peptides may be digests of naturally occurring 
proteins as is outlined above, random peptides, or "biased" random peptides. By 

20 "randomized" or grammatical equivalents herein is meant that each nucleic acid and peptide 
consists of essentially random nucleotides and amino acids, respectively. Since generally 
these random peptides (or nucleic acids, discussed below) are chemically synthesized, they 
may typically incorporate any nucleotide or amino acid at any position. The synthetic 
process can be designed to generate randomized proteins or nucleic acids, to allow the 

25 formation of all or most of the possible combinations over the length of the sequence, thus 
forming a library of randomized candidate bioactive proteinaceous agents. 

In one embodiment, the library is fully randomized, with no sequence preferences or 
constants at any position. In a preferred embodiment, the library is biased. That is, some 
positions within the sequence are either held constant, or are selected from a limited number 

30 of possibilities. For example, in a preferred embodiment, the nucleotides or amino acid 

residues are randomized within a defined class, e.g., of hydrophobic amino acids, hydrophilic 
residues, sterically biased (either small or large) residues, towards the creation of nucleic acid 
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binding domains, the creation of cysteines, for cross-linking, prolines for SH-3 domains, 
serines, threonines, tyrosines, or histidines for phosphorylation sites, etc., or to purines, etc. 

Modulators of prostate cancer can also be nucleic acids, as defined above. 

As described above generally for proteins, nucleic acid modulating agents may be 
5 naturally occurring nucleic acids, random nucleic acids, or "biased" random nucleic acids. 
For example, digests of prokaiyotic or eukaryotic genomes may be used as is outlined above 
for proteins. 

In a preferred embodiment, the candidate compounds are organic chemical moieties, a 
wide variety of which are available in the literature. 

1 0 After the candidate agent has been added and the cells allowed to incubate for some 

period of time, the sample containing a target sequence to be analyzed is added to the 
biochip. Ifrequired, the target sequence is prepared using known techniques. For example, 
the sample may be treated to lyse the cells, using known lysis buffers, electroporation, etc., 
with purification and/or amplification such as PCR performed as appropriate. For example, 

15 an in vitro transcription witli labels covalently attached to the nucleotides is perfonned. 
Generally, the nucleic acids are labeled with biotin-FITC or PE, or with cy3 or cy5. 

In a preferred embodiment, the target sequence is labeled with, e.g., a fluorescent, a 
chemiluminescent, a chemical, or a radioactive signal, to provide a means of detecting the 
target sequence's specific binding to a probe. The label also can be an enzyme, such as, 

20 alkaline phosphatase or horseradish peroxidase, which when provided with an appropriate 
substrate produces a product that can be detected. Alternatively, the label can be a labeled 
compound or small molecule, such as an enzyme inhibitor, that binds but is not catalyzed or 
altered by the enzyme. The label also can be a moiety or compound, such as, an epitope tag 
or biotin which specifically binds to streptavidin. For the example of biotin, the streptavidin 

25 is labeled as described above, thereby, providing a detectable signal for the bound target 
sequence. Unbound labeled streptavidin is typically removed prior to analysis. 

As wiU be appreciated by those in the art, these assays can be direct hybridization 
assays or can comprise "sandwich assays", which include the use of multiple probes, as is 
generally outlined in U.S. Patent Nos. 5,681,702, 5,597,909, 5,545,730, 5,594,1 17, 

30 5,591,584, 5,571,670, 5,580,731, 5,571,670, 5,591,584, 5,624,802, 5,635,352, 5,594,118, 
5,359,100, 5,124,246, and 5,681,697, eachof which is hereby incorporated by reference. In 
this embodiment, in general, the target nucleic acid is prepared as outlined above, and then 
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added to the biochip comprising a plurality of nucleic acid probes, under conditions that 
allow the formation of a hybridization complex. 

A variety of hybridization conditions may be used in the present invention, including 
high, moderate, and low stringency conditions as outlined above. The assays are generally 
5 run under stringency conditions which allows formation of the label probe hybridization 
complex only in the presence of target. Stringency can be controlled by altering a step 
parameter that is a thennodynamic variable, including, but not limited to, temperature, 
formamide concentration, salt concentration, chaotropic salt concentration pH, organic 
solvent concentration, etc. 

10 These parameters may also be used to control non-specific binding, as is generally 

outlined in U.S. Patent No. 5,681,697. Thus it may be desirable to perform certain steps at 
higher stringency conditions to reduce non-specific binding. 

The reactions outlined herein may be accomplished in a variety of ways. Components 
of the reaction may be added simultaneously, or sequentially, in different orders, with 

15 preferred embodiments outlined below. In addition, the reaction may include a variety of 
other reagents. These include salts, buffers, neutral proteins, e.g., albumin, detergents, etc., 
which may be used to facilitate optimal hybridization and detection, and/or reduce non- 
specific or backgroimd interactions. Reagents that otherwise improve the efficiency of the 
assay, such as protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may also be 

20 used as appropriate, depending on the sample preparation methods and purity of the target. 
The assay data are analyzed to determine the expression levels, and changes in 
expression levels as between states, of individual genes, forming a gene expression profile. 

Screens are performed to identify modulators of the prostate cancer or related 
phenotype. In one embodiment, screening is performed to identify modulators that can 

25 induce or suppress a particular expression profile, thus preferably generating the associated 
phenotype. In another embodiment, e.g., for diagnostic ^plications, having identified 
differentially ejqpressed genes important in a particular state, screens can be performed to 
identify modulators that alter expression of individual genes. In an another embodiment, 
screening is performed to identify modulators that alter a biological function of the 

30 expression product of a differentially expressed gene. Again, having identified the 

importance of a gene in a particular state, screens are performed to identify agents that bind 
and/or modulate the biological activity of the gene product. 
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In addition screens can be done for genes that are induced in response to a candidate 
agent. After identifying a modulator based upon its ability to suppress a prostate cancer 
expression pattern leading to a normal expression pattern, or to modulate a single prostate 
cancer gene expression profile so as to mimic the expression of the gene from normal tissue, 
5 a screen as described above can be performed to identify genes that are specifically 

modulated in response to the agent. Comparing expression profiles between normal tissue 
and agent treated prostate cancer tissue reveals genes that are not expressed in normal tissue 
or prostate cancer tissue, but are expressed in agent treated tissue. These agent-specific 
sequences can be identified and used by methods described herein for prostate cancer genes 

10 or proteins. In particular these sequences and the proteins they encode find use in marking or 
identifying agent treated cells. In addition, antibodies can be raised against the agent induced 
proteins and used to target novel therapeutics to the treated prostate cancer tissue sample. 

Thus, in one embodiment, a test compound is administered to a population of prostate 
cancer cells, that have an associated prostate cancer expression profile. By "administration" 

15 or "contacting" herein is meant that the candidate agent is added to the cells in such a manner 
as to allow the agent to act upon the cell, whether by uptake and intracellular action, or by 
action at the cell surface. In some embodiments, nucleic acid encoding a proteinaceous 
candidate agent (e.g., a peptide) may be put into a viral construct such as an adenoviral or 
retroviral construct, and added to tlie cell, such that expression of the peptide agent is 

20 accomplished, e.g., PCT US97/01019. Regulatable gene therapy systems can also be used. 

Once the test compound has been administered to the cells, the cells can be washed if 
desired and are allowed to incubate under preferably physiological conditions for some 
period of time. The cells are then harvested and a new gene expression profile is generated, 
as outiined herein. 

25 Thus, e.g., prostate cancer or non-malignant tissue may be screened for agents that 

modulate, e.g., induce or suppress the prostate cancer or related phenotype. A change in at 
least one gene, preferably many, of the expression profile indicates that the agent has an 
effect on prostate cancer activity. By defining such a signature for the prostate cancer 
phenotype, screens for new drugs that alter the phenotype can be devised. With this 

30 approach, the drug target need not be known and need not be represented in the original 
expression screening platform, nor does the level of transcript for the target protein need to 
change. 
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In a preferred embodiment, as outlined above, screens maybe done on individual 
genes and gene products (proteins). That is, having identified a particular differentially 
expressed gene as important in a particular state, screening of modulators of either the 
expression of the gene or the gene product itself can be done. The gene products of 
5 differentially expressed genes are sometimes referred to herein as "prostate cancer proteins" 
or a "prostate cancer modulatory protein". The prostate cancer modulatory protein may be a 
fragment, or alternatively, be the full length protein to the fragment encoded by the nucleic 
acids of the Tables 1 A-4. Preferably, the prostate cancer modulatory protein is a fragment. • 
Ih a preferred embodiment, the prostate cancer amino acid sequence which is used to 

1 0 detennine sequence identity or similarity is encoded by a nucleic acid of Tables 1 A-4. Li 
another embodiment, the sequences are naturally occurring allelic variants of a protein 
encoded by a nucleic acid of Tables 1 A-4. In another embodiment, the sequences are 
sequence variants as further described herein. 

Preferably, the prostate cancer modulatory protein is a fragment of approximately 1 4 

15 to 24 amino acids long. More preferably the fragment is a soluble fragment. Preferably, the 
fragment includes a non-transmembrane region. In a preferred embodiment, the fragment has 
an N-temiinal Cys to aid in solubility. In one embodiment, the C-terminus of the fragment is 
kept as a free acid and the N-terminus is a free amine to aid in coupling, i.e., to cysteine. 
In one embodiment the prostate cancer proteins are conjugated to an immunogenic 

20 agent as discussed herein. In one embodiment the prostate cancer protein is conjugated to 
BSA. 

Measurements of prostate cancer polypeptide activity, or of prostate cancer or the 
prostate cancer phenotype can be performed using a variety of assays. For example, the 
effects of the test compounds upon the ftmction of the prostate cancer polypeptides can be 

25 measured by examining parameters described above. A suitable physiological change that 
affects activity can be used to assess tiie influence of a test compound on the polypeptides of 
this invention. When the functional consequences are determined using intact cells or 
animals, one can also measure a variety of effects such as, in the case of prostate cancer 
associated with tumors, tumor growth, tumor metastasis, neovascularization, hormone 

30 release, transcriptional changes to both known and uncharacterized genetic markers (e.g., 
northern blots), changes in cell metabolism such as cell growth or pH changes, and changes 
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in intracellular second messengers such as cGMP. In the assays of the invention, a 
mammalian prostate cancer polypeptide is typically used, e.g., mouse, preferably human. 

Assays to identify compounds with modulating activity can be performed in vitro. 
For example, a prostate cancer polypeptide is furst contacted with a potential modulator and 
5 incubated for a suitable amount of time, e.g., from 0.5 to 48 hours. In one embodiment, the 
prostate cancer polypeptide levels are determined in vitro by measuring the level of protein or 
mRNA. The level of protein is measured using immunoassays such as westem blotting, 
ELIS A, and the like with an antibody that selectively binds to the prostate cancer polypeptide 
or a fragment thereof. For measurement of mRNA, amplification, e.g., using PGR, LCR, or 

10 hybridization assays, e.g., northern hybridization, RNAse protection, dot blotting, are 
preferred. The level of protein or mRNA is detected using directly or indirectly labeled 
detection agents, e.g., fluorescently or radioactively labeled nucleic acids, radioactively or 
enzymatically labeled antibodies, and the like, as described herein. 

Alternatively, a reporter gene system can be devised using the prostate cancer protein 

15 promoter operably linked to a reporter gene such as luciferase, green fluorescent protein, 
CAT, or p-gal. The reporter construct is typically transfected into a cell. After treatment 
with a potential modulator, tlie amount of reporter gene transcription, translation, or activity 
is measured according to standard techniques known to those of skill in the art. 

In a preferred embodiment, as outlined above, screens may be done on individual 

20 genes and gene products (proteins). That is, having identified a particular differentially 

expressed gene as important in a particular state, screening of modulators of the expression of 
the gene or the gene product itself can be done. The gene products of differentially expressed 
genes are. sometimes referred to herein as "prostate cancer proteins." The prostate cancer 
protein may be a fragment, or alternatively, be the ftiU length protein corresponding to a 

25 fragment shown herein. 

In one embodiment, screening for modulators of expression of specific genes is 
performed. Typically, the expression of only one or a few genes are evaluated. In another 
embodiment, screens are designed to first find compounds that bind to differentially 
expressed proteins. These compounds are then evaluated for the ability to modulate 

30 differentially expressed activity. Moreover, once initial candidate compounds are identified, 
variants can be further screened to better evaluate structure activity relationships. 
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In a preferred embodiment, binding assays are done. In general, purified or isolated 
gene product is used; that is, the gene products of one or more differentially expressed 
nucleic acids are made. For example, antibodies are generated to the protein gene products, 
and standard immunoassays are run to detennine the amount of protein present. 
5 Alternatively, cells comprising the prostate cancer proteins can be used in the assays. 

Thus, in a preferred embodiment, the methods comprise combining a prostate cancer 
protein and a candidate compound, and determining the binding of the compound to the 
prostate cancer protein. Preferred embodiments utilize the hiiman prostate cancer protein, 
although other mammalian proteins may also be used, e.g., for the development of animal 

10 models of human disease. In some embodiments, as outlined herein, variant or derivative 
prostate cancer proteins may be used. 

Generally, in a preferred embodiment of the methods herein, the prostate cancer 
protein or the candidate agent is non-difiiisably bound to an insoluble support having isolated 
sample receiving areas (e.g., a microtiter plate, an array, etc.). The insoluble supports may be 

1 5 made of a composition to which the compositions can be bound, is readily separated from 
soluble material, and is otherwise compatible with the overall method of screening. The 
surface of such supports may be solid or porous and of a convenient shape. Examples of 
suitable insoluble supports include microtiter plates, arrays, membranes, and beads. These 
are typically made of glass, plastic (e.g., polystyrene), polysaccharides, nylon or 

20 nitrocellulose, teflon^M^ etc. Microtiter plates and arrays are especially convenient because a 
large number of assays can be canied out simultaneously, using small amounts of reagents 
and samples. The particular manner of binding of the composition should be compatible with 
the reagents and overall methods of the invention, maintain the activity of the composition, 
and be nondiflRisable. Preferred methods of binding include the use of aiitibodies (which do 

25 not sterically block either the ligand binding site or activation sequence when the protein is 
bound to the support), direct binding to "sticky" or ionic supports, chemical crosslinking, the 
synthesis of the protein or agent on the surface, etc. Following binding of the protein or 
agent, excess unbound material is removed by washing. The sample receiving areas may 
then be blocked through incubation with bovine serum albimiin (BSA), casein, or other 

30 innocuous protein or other moiety. 

In a preferred embodiment, the prostate cancer protein is bound to the support, and a 
test compound is added to the assay. Alternatively, the candidate agent is bound to the 
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support and the prostate cancer protein is added. Novel binding agents include specific 
antibodies, non-natural binding agents identified in screens of chemical libraries, peptide 
analogs, etc. Of particular interest are screening assays for agents that have a low toxicity for 
human cells. A wide variety of assays may be used for this purpose, including labeled in 
5 vitro protein-protein binding assays, electrophoretic mobility shift assays, immunoassays for 
protein binding, functional assays (phosphorylation assays, etc.) and the like. 

The determination of the binding of the test modulating compound to the prostate 
cancer protein may be done in a number of ways. In a preferred embodiment, the compound 
is labeled, and binding determined directly, e.g., by attaching all or a portion of the prostate 

10 cancer protein to a solid support, adding a labeled candidate agent (e.g., a fluorescent label), 
washing off excess reagent, and determining whether the label is present on the soUd support. 
Various blocking and washing stqjs may be utilized as appropriate. 

In some embodiments, only one of the components is labeled, e.g., the proteins (or 
proteinaceous candidate compounds) can be labeled. Alternatively, more than one 

1 5 component can be labeled with different labels, e.g., ^'^^I for the proteins and a fluorophor for 
the compound. Proximity reagents, e.g., quenching or energy transfer reagents are also 
useful. 

In one embodiment, the binding of the test compound is determined by competitive 
binding assay. The competitor is a binding moiety known to bind to the target molecule (i.e., 

20 a prostate cancer protein), such as an antibody, peptide, binding partner, ligand, etc. Under 
certain circumstances, there may be competitive binding between the compound and the 
binding moiety, with the binding moiety displacing the compound. In one embodiment, the 
test compound is labeled. Either the compound, or the competitor, or both, is added first to 
the protein for a time sufficient to allow binding, if present. Incubations may be performed at 

25 a temperature which facilitates optimal activity, typically between 4 and 40° C. Incubation 
periods are typically optimized, e.g., to facilitate r^id high throughput screenmg. Typically 
between 0.1 and 1 hour will be sufficient. Excess reagent is generally removed or washed 
away. The second component is then added, and the presence or absence of the labeled 
component is followed, to indicate binding. 

30 In a preferred embodiment, the competitor is added first, followed by the test 

compound. Displacement of the competitor is an indication that the test compound is binding 
to the prostate cancer protein and thus is capable of binding to, and potentially modulating, 
• 73 
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the activity of the prostate cancer protein, hi this embodiment, either component can be 
labeled. Thus, e.g., if the competitor is labeled, the presence of label in the wash solution 
indicates displacement by the agent. Alternatively, if the test compound is labeled, the 
presence of the label on the support indicates displacement. 
5 In an alternative embodiment, the test compound is added first, with incubation and 

washing, followed by the competitor. The absence of binding by the competitor may indicate 
that the test compound is bound to the prostate cancer protein with a higher affmity. Thus, if 
the test compound is labeled, the presence of the label on the support, coupled with a lack of 
competitor binding, may indicate that the test compound is capable of binding to the prostate 

10 cancer protein. 

hi a preferred embodiment, the methods comprise differential screening to identity 
agents that are capable of modulating the activity of the prostate cancer proteins, hi this 
embodiment, the methods comprise combining a prostate cancer protein and a competitor in a 
first sample. A second sample comprises a test compound, a prostate cancer protein, and a 

15 competitor. The binding of the competitor is determined for both samples, and a change, or 
difference in binding between the two samples indicates the presence of an agent capable of 
binding to the prostate cancer protein and potentially modulating its activity. That is, if the 
binding of the competitor is different in the second sample relative to the first sample, the 
agent is capable of binding to the prostate cancer protein. 

20 Alternatively, differential screening is used to identify drug candidates that bind to the 

native prostate cancer protein, but cannot bind to modified prostate cancer proteins. The 
structure of the prostate cancer protein may be modeled, and used in rational drag design to 
synthesize agents that interact with that site. Drag candidates that affect the activity of a 
prostate cancer protein are also identified by screening drugs for the ability to either enhance 

25 or reduce the activity of the protein. 

Positive controls and negative controls may be used in the assays. Preferably control 
and test samples are performed in at least triplicate to obtain statistically significant results, 
hicubation of samples is for a time sufficient for the binding of the agent to the protein. 
Following incubation, samples are washed fi-ee of non-specifically bound material and the 

30 amount of bound, generally labeled agent determined. For example, where a radiolabel is 
employed, the samples may be counted in a scintillation counter to determine the amount of 
bound compound. 
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A variety of other reagents may be included in the screening assays. These include 
reagents like salts, neutral proteins, e.g., albumin, detergents, etc., which maybe used to 
facilitate optimal protein-protein binding and/or reduce non-specific or background 
interactions. Also reagents that otherwise improve the efficiency of the assay, such as 
5 protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may be used. The mixture 
of components may be added in an order that provides for the requisite binding. 

In a preferred embodiment, the invention provides methods for screening for a 
compound capable of modulating the activity of a prostate cancer protein. The methods 
comprise adding a test compound, as defined above, to a cell comprising prostate cancer 
10 proteins. Preferred cell types include almost any cell. The cells contain a recombinant 

nucleic acid that encodes a prostate cancer protein. In a preferred embodiment, a library of 
candidate agents are tested on a plurality of cells. 

In one aspect, the assays are evaluated in the presence or absence or previous or 
subsequent exposure of physiological signals, e.g., hormones, antibodies, peptides, antigens, 
15 cytokines, growth factors, action potentials, pharmacological agents including 

chemotherapeutics, radiation, carcinogenics, or other cells (e.g., cell-cell contacts). In 
another example, the determinations are determined at different stages of the cell cycle 
process. 

In this way, compounds that modulate prostate cancer agents are identified. 
20 Compounds with pharmacological activity are able to enhance or interfere with the activity of 
the prostate cancer protein. Once identified, similar structures are evaluated to identify 
critical structural feature of the compound. 

In one embodiment, a method of inhibiting prostate cancer cell division is provided. 
The method comprises administration of a prostate cancer inhibitor. In another embodiment, 
25 a method of inhibiting prostate cancer or other prostate proliferative condition is provided. 
The method comprises administration of a prostate cancer inhibitor. In a firtther 
embodiment, methods of treating cells or individuals with prostate cancer are provided. The 
method comprises administration of a prostate cancer inhibitor. 

In one embodiment, a prostate cancer inhibitor is an antibody as discussed above. In 
30 another embodiment, the prostate cancer inhibitor is an antisense molecule. 

A variety of cell growth, proliferation, and metastasis assays are known to those of 
skill in the art, as described below. 
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Soft agar growth or colony formation in suspension 

Normal cells require a solid substrate to attach and grow. When the cells are 
transformed, they lose this phenotype and grow detached from the substrate. For example, 
transformed cells can grow in stirred suspension culture or suspended in semi-solid media, 
5 such as semi-solid or soft agar. The transformed cells, when transfected with tumor 

suppressor genes, regenerate normal phenotype and require a sohd substrate to attach and 
grow. Soft agar growth or colony formation in suspension assays can be used to identify 
modulators of prostate cancer sequences, which when expressed in host cells, inhibit 
abnormal cellular proliferation and transformation. A therapeutic compound would reduce or 

10 eliminate the host cells' ability to grow in stirred suspension culture or suspended in semi- 
solid media, such as semi-solid or soft. 

Techniques for soft agar growth or colony formation in suspension assays are 
described in Freshney (1994) Culture of Animal Cells a Manual of Basic Technique 3d ed. 
Wiley-Liss, herein incorporated by reference. See also, the methods section of Garkavtsev, et 

15 al. (1996), supra, herein incorporated by reference. 
Contact inliibition and density limitation of groAvth 

Normal cells typically grow in a flat and organized pattern in a petri dish until they 
touch other cells. When the cells touch one another, they are contact inhibited and stop 
growing. When cells are transformed, however, the cells are not contact inhibited and 

20 continue to grow to high densities in disorganized foci. Thus, the transformed cells grow to a 
higher saturation density than normal cells. This can be detected morphologically by the 
formation of a disoriented monolayer of cells or rounded cells in foci within the regular 
pattern of normal surrounding cells. Alternatively, labeling index with (^H)-thymidine at 
saturation density can be used to measure density limitation of growth. See Freshney (1994), 

25 supra. The transformed cells, when transfected with tumor suppressor genes, regenerate a 
normal phenotype and become contact inhibited and would grow to a lower density. 

In this assay, labeling index with (^H)-thymidine at saturation density is a preferred 
method of measuring density limitation of growth. Transformed host cells are transfected 
with a prostate cancer-associated sequence and are grown for 24 hours at saturation density in 

30 non-limiting mediiun conditions. The percentage of cells labeling with (^H)-thymidine is 
determined autoradiographically. See, Freshney (1994), supra. 
Growth factor or serum dependence 
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Transformed cells have a lower serum dependence than their normal counterparts 
(see, e.g., Temin (1966) J.Natl. Cancer Insti. 37:167-175; Eagle, et al. (1970) J. Exp. Med. 
13 1 :836-879); Freshney, supra. This is in part due to release of various growth factors by the 
transformed cells. Growth factor or serum dependence of transformed host cells can be 
5 compared with that of control. 
Tumor specific markers levels 

Tumor cells release an increased amount of certain factors (hereinafter "tumor 
specific markers") than their normal counterparts. For example, plasminogen activator (PA) 
is released fmm hiiman glioma at a higher level than firom normal brain cells (see, e.g., 

10 Gullino, "Angiogenesis, tumor vascularization, and potential mterference with tumor growth" 
pp. 178-1 84 in Mihich (ed. 1985) Biological Responses in Cancer Pleaum. Similarly, Tumor 
angiogenesis factor (TAP) is released at a higher level in tumor cells than their normal 
counterparts. See, e.g., Folkman (1992) Angiogenesis and Cancer. Setn. Caacer Biol. 
Various techniques which measure the release of these factors are described in 

15 Freshney (1994), supra. Also, see, Unkless, et al. (1974) J. Biol. Chem. 249:4295-4305; 

Strickland and Beers (1976) J. Biol. Chem. 251 :5694-5702; Whur, et al. (1980) Br. J. Cancer 
42:305-312; Gullino, "Angiogenesis, tumor vascularization, and potential interference with 
tumor growth" pp. 178-1 84 in Mihich (ed. 1985) Biological Responses in Cancer Plenum; 
and Freshney (1985) Anticancer Res . 5 : 1 1 1 -1 30. 

20 Invasiveness into Matrigel 

The degree of invasiveness into Matrigel or some other extracellular matrix 
constituent can be used as an assay to identify compounds that modulate prostate cancer- 
associated sequences. Tumor -cells exhibit a good correlation betweea malignancy and 
invasiveness of cells mto Matrigel or some other extracellular matrix constituent. In this 

25 assay, tumorigenic cells are typically used as host cells. Expression of a timior suppressor 
gene in these host cells would decrease invasiveness of the host cells. 

Techniques described in Freshney (1994), supra, can be used. Briefly, the level of 
invasion of host cells can be measured by usmg filters coated with Matrigel or some other 
extracellular matrix constituent. Penetration into the gel, or tlirough to the distal side of the 

30 filter, is rated as invasiveness, and rated histologically by number of cells and distance 

moved, or by prelabeling the cells with ^^^I and counting the radioactivity on the distal side of 
tiie filter or bottom of the dish. See, e.g., Freshney (1984), supra. 
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Tiimor growth in vivo 

Effects of prostate cancer-associated sequences on cell growth can be tested in 
transgenic or immune-suppressed mice. Knock-out transgenic mice can be made, in which 
the prostate cancer gene is dismpted or in which a prostate cancer gene is inserted. Knock- 
5 out transgenic mice can be made by insertion of a marker gene or other heterologous gene 
into the endogenous prostate cancer gene site in the mouse genome via homologous 
recombination. Such mice can also be made by substituting the endogenous prostate cancer 
gene with a mutated version of the prostate cancer gene, or by mutating the endogenous 
prostate cancer gene, e.g., by exposure to carcinogens. 

10 A DNA construct is introduced into the nuclei of embryonic stem cells. Cells 

containing the newly engineered genetic lesion are injected into a host mouse embryo, which 
is re-implanted into a recipient female. Some of these embryos develop into chimeric mice 
that possess germ cells partially derived firom the mutant cell line. Therefore, by breeding the 
chimeric mice it is possible to obtain a new line of mice containing the introduced genetic 

15 lesion (see, e.g., Capecchi, et al. (1989) Science 244:1288-1292). Chimeric targeted mice can 
be derived according to Hogan, et al. (1988) Manipulating the Mouse Embryo: A Laboratory 
Manual CSH Press; and Robertson (ed. 1987) Teratocarcinomas and Embryonic Stem Cells: 
A Practical Approach IRL Press, Washington, D.C. 

Alternatively, various immune-suppressed or immune-deficient host animals can be 

20 used. For example, genetically athymic "nude" mouse (see, e.g., Giovanella, et al. (1974) J. 
Natl. Cancer Inst. 52:921-930), a SCID mouse, a thymectomized mouse, or an irradiated 
mouse (see, e.g., Bradley, et al. (1978) Br. J. Cancer 38:263-272; Selby, et al. (1980) Br. J. 
Cancer 41 :52-61) can be used as a host. Transplantable tumor cells (typically about 10^ cells) 
injected into isogenic hosts will produce invasive tumors in a high proportions of cases, while 

25 normal cells of similar origin will not. In hosts which developed invasive tumors, cells 
expressing a prostate cancer-associated sequences are injected subcutaneously. After a 
suitable length of time, preferably 4-8 weeks, tumor growth is measured (e.g., by volume or 
by its two largest dimensions) and compared to the control. Tumors that have statistically 
significant reduction (using, e.g., Student's T test) are said to have inhibited growth. 



Polynucleotide modulators of prostate cancer 
Antisense and RNAi Polynucleotides 
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In certain embodiments, the activity of a prostate cancer-associated protein is doAvn- 
regulated, or entirely inhibited, by the use of antisense polynucleotide, i.e., a nucleic acid 
complementary to, and which can preferably hybridize specifically to, a coding niRNA 
nucleic acid sequence, e.g., a prostate cancer protein inRNA, or a subsequence thereof. 
5 Binding of the antisense polynucleotide to the mKNA reduces the translation and/or stability 
ofthemRNA. 

In tile context of this invention, antisense polynucleotides can comprise naturally- 
occurring nucleotides, or synthetic species formed from naturally-occurring subunits or their 
close homologs. Antisense polynucleotides may also have altered sugar moieties or inter- 

10 sugar linkages. Exemplary among these are the phosphorothioate and other sulfur containing 
species which are known for use in the art. Analogs are comprehended by this invention so 
long as they function effectively to hybridize with the prostate cancer protein mKNA. See, 
e.g., Isis Pharmaceuticals, Carlsbad, CA; Sequitor, Inc., Natick, MA. 

Such antisense polynucleotides can readily be synthesized using recombinant means, 

15 or can be synthesized in vitro. Equipment for such synthesis is sold by several vendors, 
including Applied Biosystems. The preparation of other oligonucleotides such as 
phosphorothioates and alkylated derivatives is also well known to those of skill in the art. 

Antisense molecules as used herein include antisense or sense oligonucleotides. 
Sense oligonucleotides can, e.g., be employed to block transcription by binding to the anti- 

20 sense strand. The antisense and sense oligonucleotide comprise a single-stranded nucleic 
acid sequence (eitlier RNA or DNA) capable of binding to target mRNA (sense) or DNA 
(antisense) sequences for prostate cancer molecules. A preferred antisense molecule is for a 
prostate cancer sequences in Tables 1 A-4, or for a ligand or activator thereof. Antisense or 
sense oligonucleotides, according to the present invention, comprise a fragment generally at 

25 least about 14 nucleotides, preferably from about 14 to 30 nucleotides. The ability to derive 
an antisense or a sense oligonucleotide, based upon a cDNA sequence encoding a given 
protein is described in, e.g., Stein and Cohen (1988) Cancer Res. 48:2659-2668; and van der 
Krol, et al. (1988) BioTechniaues 6:958-976. 

RNA interference is a mechanism to suppress gene expression in a sequence specific 

30 manner. See, e.g., Brumelkamp, et al. (2002) Sciencexpress (21March2002); Sharp (1999) 
Genes Dev. 13:139-141; and Cathew (2001) Curr. Op. Cell Biol. 13:244-248. In mammalian 
cells, short, e.g., 21 nt, double stranded small interfering RNAs (siRNA) have been shown to 
79 
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be effective at inducing an RNAi response. See, e.g., Elbashir, et al. (2001) Nature 41 1 :494- 
498. The mechanism may be used to downregulate ejcpression levels of identified genes, e.g., 
treatment of or validation of relevance to disease. 

5 Ribozymes 

In addition to antisense polynucleotides, ribozymes can be used to target and inhibit 
transcription of prostate cancer-associated nucleotide sequences. A ribozyme is an RNA 
molecule that catalytically cleaves otlier RNA molecules. Different kinds of ribozymes have 
been described, including group I ribozymes, hammerhead ribozymes, hairpin ribozymes, 

10 RNase P, and axhead ribozymes (see, e.g.. Castanotto. et al. (1994) Adv. in Pharmacology 
25: 289-317 for a general review of the properties of different ribozymes). 

The general features of hairpin ribozymes are described, e.g., in Hampel, et al. (1990) 
Nucl. Acids Res. 18:299-304; European Patent PubUcationNo. 0 360 257; U.S. Patent No. 
5,254,678. Methods of preparing are well known to those of skill in the art. See, e.g., WO 

1 5 94/26877; Ojwang, et al. (1 993) Proc. Natl. Acad. Sci. USA 90:6340-6344; Yamada, et al. 
(1994) Human Gene Therapy 1:39-45; Leavitt, et al. (1995) Proc. Natl. Acad. Sci. USA 
92:699-703; Leavitt, et al. (1994) Human Gene Therapy 5:1151-120; and Yamada, et al. 
(1994) Virology 205:121-126. 

Polynucleotide modulators of prostate cancer may be introduced into a cell containing 

20 the target nucleotide sequence by formation of a conjugate with a ligand binding molecule, as 
described in WO 91/04753. Suitable ligand binding molecules include, but are not limited to, 
cell surface receptors, growth factors, other cytokines, or other Ugands that bind to cell 
surface receptors. Preferably, conjugation of the ligand buiding molecule does not 
substantially interfere with the ability of the Ugand binding molecule to bind to its 

25 corresponding molecule or receptor, or block entry of the sense or antisense oligonucleotide 
or its conjugated version into the cell. Alternatively, a polynucleotide modulator of prostate 
cancer may be introduced into a cell containing the target nucleic acid sequence, e.g., by 
formation of an polynucleotide-lipid complex, as described in WO 90/10448. It is 
understood that the use of antisense molecules or knock out and knock in models may also be 

30 used in screening assays as discussed above, in addition to methods of treatment. 

Thus, in one embodiment, methods of modulating prostate disorders, e.g., cancer in 
cells or organisms, are provided. In one embodiment, the methods comprise administering to 
80 
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a patient, e.g., to a cell within the patient, an anti-prostate cancer antibody that reduces or 
eliminates the biological activity of an endogenous prostate cancer protein. Alternatively, the 
methods comprise administerioag to a cell or organism a recombinant nucleic acid encoding a 
prostate cancer protein. This may be accomplished in many ways. In a preferred 
5 embodiment, e.g., when the prostate cancer sequence is down-regulated in prostate cancer, 
such state may be reversed by increasing the amount of prostate cancer gene product in the 
cell. This can be accomplished, e.g., by overexpressing the endogenous prostate cancer gene 
or administering a gene encoding the prostate cancer sequence, using known gene-therapy 
techniques, e.g.. In a preferred embodiment, the gene therapy techniques include the 

10 incorporation of the exogenous gene using enhanced homologous recombination (EHR), e.g., 
as described in PCT/US93/03868, hereby incorporated by reference in its entirety. 
Alternatively, e.g., when the prostate cancer sequence is up-regulated in prostate cancer, the 
activity of the endogenous prostate caacer gene is decreased, e.g., by the administration of a 
prostate cancer antisense nucleic acid. 

15 In one embodiment, the prostate cancer proteins of the present invention may be used 

to generate polyclonal and monoclonal antibodies to prostate cancer proteins. Similarly, the 
prostate cancer proteins can be coupled, using standard technology, to affinity 
chromatography columns. These columns may then be used to piirify prostate cancer 
antibodies useful for production, diagnostic, or therapeutic purposes. In a preferred 

20 embodiment, the antibodies are generated to epitopes unique to a prostate cancer protein; that 
is, the antibodies show little or no cross-reactivity to other proteins. The prostate cancer 
antibodies may be coupled to standard affinity chromatography columns and used to purify 
prostate cancer proteins. The antibodies may also be used as blocking polypeptides, as 
outlined above, since they Avill specifically bind to the prostate cancer protein. 

25 

Methods of identifying variant prostate cancer-associated sequences 

Without being bound by theory, expression of various prostate cancer sequences is 
correlated with prostate cancer or other prostate disorders. Accordingly, disorders based on 
mutant or variant prostate cancer genes may be determined. In one embodiment, the 
30 invention provides methods for identifying cells containing variant prostate cancer genes, 
e.g., determining all or part of the sequence of at least one endogenous prostate cancer genes 
in a cell. This may be accomplished using many sequencing techniques. In a preferred 
81 
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embodiment, the invention provides methods of identifying the prostate cancer genotype of 
an individual, e.g., detemiining all or part of the sequence of at least one prostate cancer gene 
of the individual. This is generally done in at least one tissue of the individual, and may 
include the evaluation of a number of tissues or different samples of the same tissue. The 
5 method may include comparing the sequence of the sequenced prostate cancer gene to a 
known prostate cancer gene, e.g., a wild-type gene. 

The sequence of all or part of the prostate cancer gene can then be compared to the 
sequence of a known prostate cancer gene to determine if differences exist. This can be done 
using many known homology programs, such as Bestfit, etc. In a preferred embodiment, the 

1 0 presence of a difference in the sequence between the prostate cancer gene of the patient and 
the known prostate cancer gene correlates with a disease state or a propensity for a disease 
state, as outlined herein. 

In a preferred embodiment, the prostate cancer genes are used as probes to determine 
the number of copies of the prostate cancer gene in the genome. 

15 In another preferred embodiment, the prostate cancer genes are used as probes to 

determine the chromosomal localization of the prostate cancer genes. Information such as 
chromosomal localization finds use in providing a diagnosis or prognosis in particular when 
chromosomal abnormalities such as translocations, and the like are identified in the prostate 
cancer gene locus. 

20 

Administration of pharmaceutical and vaccine compositions 

In one embodiment, a therapeutically effective dose of a prostate cancer protein or 
modulator thereof, is administered to a patient By "therapeutically effective dose" herein is 
meant a dose that produces effects for which it is administered. The exact dose will depend 

25 on the purpose of the treatment, and wiU be ascertainable by one skilled in the art using 
known techniques (e.g., Ansel, et al. (1992) Pharmaceutical Dosage Forms and Drug 
Delivery ; Lieberman (1993) Pharmaceutical Dosage Forms (vols. 1-3, Dekker, ISBN 
0824770846, 0824769 18X, 0824712692, 0824716981: Llovd (1999^ The Art. Science and 
Technology of Pharmaceutical Compounding Amer. Pharma. Assn.; and Pickar (1999) 

30 Dosage Calculations Thomson). Adjustments for prostate cancer degradation, systemic 

versus localized delivery, and rate of new protease synthesis, as well as the age, body weight, 
general health, sex, diet, time of administration, drug interaction, and the severity of the 
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condition may be necessary, and will be ascertainable with routine experimentation by those 
skilled in the art. U.S. Patent Application N. 09/687,576 further discloses the use of 
compositions and methods of diagnosis and treatment in prostate cancer is hereby expressly 
incorporated by reference. 
5 A "patient" for the purposes of the present invention includes both hiunans and other 

animals, particularly mammals. Thus the methods are applicable to both human therapy and 
veterinary applications. In the preferred embodiment the patient is a mammal, preferably a 
primate, and in the most preferred embodiment the patient is human. The patient typically 
will suffer Jfom a prostate proliferative disorder, e.g., malignant or non-malignant, and may 

1 0 include cancer of other related conditions or disorders. 

The administration of the prostate cancer proteins and modulators thereof of the 
present invention can be done in a variety of ways as discussed above, including, but not 
Umited to, orally, subcutaneously, intravenously, intranasally, transdermally, 
mtraperitoneally, intramuscularly, intrapulmonary, vaginally, rectally, or intraocularly. In 

1 5 some instances, e.g., in the treatment of wounds and inflammation, the prostate cancer 
proteins and modulators may be directly applied as a solution or spray, or via catheter. 

The pharmaceutical compositions of the present invention comprise a prostate cancer 
protein in a form suitable for administration to a patient. In the preferred embodiment, the 
pharmaceutical compositions are in a water soluble form, such as being present as 

20 pharmaceutically acceptable salts, which is meant to include both acid and base addition 
salts. "Pharmaceutically acceptable acid addition salt" refers to those salts that retain the 
biological effectiveaess of the free bases and that are not biologically or otherwise 
undesirable, formed with inorganic acids such as hydrochloric acid, hydrobromic acid, 
sulfiiric acid, nitric acid, phosphoric acid and the like, and organic acids such as acetic acid, 

25 propionic acid, glycolic acid, pyruvic acid, oxalic acid, maleic acid, malonic acid, succinic 
acid, fumaric acid, tartaric acid, citric acid, benzoic acid, cumamic acid, mandelic acid, 
methanesulfonic acid, ethanesulfonic acid, p-toluenesulfonic acid, salicylic acid, and the like. 
"Pharmaceutically acceptable base addition salts" include those derived from inorganic bases 
such as sodium, potassiimi, lithium, ammonium, calcium, magnesium, iron, zinc, copper, 

30 manganese, aluminum salts, and the like. Particularly preferred are the ammonium, 

potassium, sodium, calcium, and magnesium salts. Salts derived from pharmaceutically 
acceptable organic non-toxic bases include salts of primary, secondary, and tertiary amines, 
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substituted amines including naturally occurring substituted amines, cyclic amines and basic 
ion exchange resins, such as isopropylamine, trimethylamine, diethylamine, triethylamine, 
tripropylamine, and ethanolamine. 

The pharmaceutical compositions may also include one or more of the following: 
5 carrier proteins such as serum albumin; buffers; fillers such as microcrystalline cellulose, 
lactose, com and other starches; binding agents; sweeteners and other flavoring agents; 
coloring agents; and polyethylene glycol. 

The pharmaceutical compositions can be administered in a variety of unit dosage 
forms depending upon the method of administration. For example, unit dosage forms 

1 0 suitable for oral administration include, but are not limited to, powder, tablets, pills, capsules 
and lozenges. It is recognized that prostate cancer protein modulators (e.g., antibodies, 
antisense constructs, ribozymes, small organic molecules, etc.) when administered orally, 
should be protected jBrom digestion. This is typically accomplished either by complexing the 
molecule(s) with a composition to render it resistant to acidic and enzymatic hydrolysis, or by 

1 5 packaging the molecule(s) in an appropriately resistant carrier, such as a liposome or a 
protection barrier. Means of protecting agents from digestion are well known in the art. 

The compositions for administration will commonly comprise a prostate cancer 
protein modulator dissolved in a pharmaceutically acceptable carrier, preferably an aqueous 
carrier. A variety of aqueous caxriers can be used, e.g., buffered saline and the like. These 

20 solutions are typically sterile and generally free of undesirable matter. These compositions 
may be sterilized by conventional, well known sterilization techniques. The compositions 
may contain pharmaceutically acceptable auxiliary substances as required to approximate 
physiological conditions such as pH adjusting and buflfering agents, toxicity adjusting agents 
and the like, e.g., sodium acetate, sodium chloride, potassium chloride, calcimn chloride, 

25 sodium lactate, and the like. The concentration of active agent in these formulations can vary 
widely, and will be selected primarily based on fluid volumes, viscosities, body weight, and 
the like in accordance with the particular mode of administration selected and the patient's 
needs (e.g., (1980) Remington's Pharmaceutical Science (15th ed.); and Hardman, et al. (eds. 
2001) Goodman & Gihnan: The Phaiinacological Basis of Therapeutics McGraw-Hill. 

30 Thus, a typical pharmaceutical composition for intravenous administration would be 

about 0.1 to 10 mg per patient per day. Dosages from 0.1 up to about 100 mg per patient per 
day may be used, particularly when the drag is administered to a secluded site and not into 
84 
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the blood stream, such as into a body cavity or into a lumen of an organ. Substantially higher 
dosages are possible in topical administration. Actual methods for preparing parenterally 
administrable compositions will be known or apparent to those skilled in the art, e.g., 
Remington's Pharmaceutical Science and Goodman and Gilman: The Pharmacological Basis 
5 of Therapeutics , supra. 

The compositions containing modulators of prostate cancer proteins can be 
administered for therapeutic or prophylactic treatments, hi therapeutic applications, 
compositions are administered to a patient suffering jSrom a disease (e.g., a cancer) in an 
amount sufficient to cure or at least partially retard or arrest the disease and its complications. 

10 An amount adequate to accomplish this is defined as a "therapeutically effective dose." 
Amounts effective for this use will depend upon the severity of the disease and the general 
state of the patient's health. Single or multiple administrations of the compositions may be 
administered depending on the dosage and frequency as required and tolerated by the patient. 
The composition should provide a sufficient quantity of the agents of this invention to 

1 5 effectively treat the patient. An amount of modulator that is capable of preventing or slowing 
the development of cancer in a mammal is referred to as a "prophylactically effective dose." 
The particular dose required for a prophylactic treatment will depend upon the medical 
condition and history of the mammal, the particular cancer being prevented, as well as other 
factors such as age, weight, gender, administration route, efficiency, etc. Such prophylactic 

20 treatments may be used, e.g., in a mammal who has previously had cancer to prevent a 

recurrence of the cancer, or in a mammal who is suspected of having a significant likelihood 
of developing cancer, e.g., based partly on gene expression profiles. 

It will be appreciated that the present prostate cancer protein-modulating compounds 
can be administered alone or in combination with additional prostate cancer modulating 

25 compounds or with other therapeutic agent, e.g., other anti-cancer agents or treatments. 
In numerous embodiments, one or more nucleic acids, e.g., polynucleotides 
comprising nucleic acid sequences set forth in Tables 1 A-4such as antisense polynucleotides, 
silencing KNfA, or ribozymes, will be introduced into cells, in vitro or in vivo. The present 
invention provides methods, reagents, vectors, and cells useful for expression of prostate 

30 cancer-associated polypeptides and nucleic acids using in vitro (cell-free), ex vivo or in vivo 
(cell or organism-based) recombinant expression systems. 
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The particular procedure used to introduce the nucleic acids into a host cell for 
expression of a protein or nucleic acid is application specific. Many procedures for 
introducing foreign nucleotide sequences into host cells may be used. These include the use 
of calcium phosphate transfection, spheroplasts, electroporation, liposomes, microinjection, 
5 plasma vectors, viral vectors, and many other well known methods for introducing cloned 
genomic DNA, cDNA, synthetic DNA, or other foreign genetic material into a host cell (see, 
e.g., Berger and Kimmel (1987) Guide to Molecular Cloning Techniques from Methods in 
Enzymology (vol. 152) Academic Press; Ausubel, et al., (eds. supplemented through 1999) 
Current Protocols Lippincott; and Sambrook, et al. (1989) Molecular Cloning: A Laboratory 

10 Manual (2d ed., Vol. 1-3) CSH Press. 

In a preferred embodiment, prostate cancer proteins and modulators are administered 
as therapeutic agents, and can be formulated as outlined above. Similarly, prostate cancer 
genes (including both the full-length sequence, partial sequences, or regulatory sequences of 
tiie prostate cancer coding regions) can be administered in a gene therapy application. These 

15 prostate cancer genes can include antisense applications, either as gene therapy (i.e., for 

incorporation into the genome) or as antisense compositions, as will be appreciated by those 
in the art. 

Prostate cancer polypeptides and polynucleotides can also be administered as vaccine 
compositions to stimulate HTL, CTL, and antibody responses.. Such vaccine compositions 

20 can include, e.g., lipidated peptides (see, e.g.,Vitiello, et al. (1995) J. Clin. Invest. 95:341- 
349), peptide compositions encapsulated in poly(DL-lactide-co-glycolide) ("PLG") 
microspheres (see, e.g., Eldridge, et al. (1991) Mnler.. TTnmimnl. 28:287-294; Alonso, et al. 
(1994) Vaccine 12:299-306; Jones, et al. (1995) Vaccine 13:675-681), peptide compositions 
contained in immune stimulating complexes (ISCOMS) (see, e.g., Takahashi, et al. (1990) 

25 Nature 344:873-875; Hu, et al. (1998) Clin Exp Immunol. 113:235-243), multiple antigen 
peptide systems (MAPs) (see, e.g., Tam (1988) Proc. Natl. Acad. Sci. USA 85:5409-5413; 
Tam (1996) J. Immunol. Methods 196:17-32), peptides formulated as multivalent peptides; 
peptides for use in baUistic delivery systems, typically cr>'stallized peptides, viral delivery 
vectors (Perkus, et al., p. 379, in Kaufimann (ed. 1996) Concepts in vaccine development de 

30 Gruyter; Chakrabarti, et al. (1986) Nature 320:535-537; Hu, et al. (1986) Nature 320:537- 
540; Kieny, et al. (1986) ADDS Bio/Technologv 4:790-xxx; Top, et al. (1971) J. hifect. Pis. 
124:148-154; Chanda, et al. (1990) Virology 175:535-547), particles of viral or synthetic 
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origin (see, e.g., Kofler, et al. (1996) J. Immuiiol. Methods 192:25-35; Eldridge, et al. (1993) 
Sem. Hematol. 30:16-24; Falo, et al. (1995) Nature Med. 7:649-653), adjuvants (Warren, et 
al. (1986) Arniu. Rev. Immunol. 4:369-388; Gupta, et al. (1993) Vaccine 1 1 :293-306), 
liposomes (Reddy, et al. (1992) J. Immunol. 148:1585-1589; Rock (1996) Immunol. Today 
5 17:131-137), or, naked or particle absorbed cDNA (Ulmer, et al. (1993) Science 259:1745- 
1749; Robinson, et al. (1993) Vaccine 11:957-960; Shiver, et al., p. 423, in Kaufinann (ed. 
1996) Concepts in Vaccuie Developmeat de Gruyter; Cease and Berzofsky (1994) Annu. 
Rev. Immunol. 12:923-989; and Eldridge, et al. (1993) Sem. Hematol. 30:16-24). Toxin- 
targeted delivery technologies, also known as receptor mediated targeting, such as those of 

10 Avant Lnmunotherapeutics, Inc. (Needham, Massachusetts) may also be used. 

Vaccine compositions often include adjuvants. Many adjuvants contain a substance 
designed to protect the antigen firam rapid catabolism, such as aluminum hydroxide or 
mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or 
Mycobacterium tuberculosis derived proteins. Certain adjuvants are commercially available 

15 as, e.g., B'reund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, Detroit, 
MI); Merck Adjuvant 65 (Merck and Company, Inc., Rahway, NJ); AS-2 (SmithKline 
Beecham, Philadelphia, PA); aluminum salts such as aluminum hydroxide gel (alum) or 
aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of acylated 
tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; 

20 polyphosphazenes; biodegradable microspheres; monophosphoryl Upid A., and quil A. 

Cytokines, such as GM-CSF, interleukin-2, -7, -12, and other like growth factors, may also be 
used as adjuvants. 

Vaccmes can be administered as nucleic acid compositions wherein DNA or RNA 
encoding one or more of the polypeptides, or a fragment thereof, is administered to a patient. 
25 This approach is described, for instance, in Wolff, et al. (1990) Science 247:1465-1468 as 
well as U.S. Patent Nos. 5,580,859; 5,589,466; 5,804,566; 5,739,118; 5,736,524; 5,679,647; 
WO 98/04720; and hi more detail below. Examples of DNA-based delivery technologies 
include "naked DNA", facilitated (bupivicaine, polymers, peptide-mediated) delivery, 
cationic lipid complexes, and particle-mediated ("gene grm") or pressure-mediated delivery 
30 (see, e.g., U.S. Patent No. 5,922,687). 

For therapeutic or prophylactic immunization purposes, the peptides of the invention 
can be expressed by vural or bacterial vectors. Examples of expression vectors include 
87 
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attenuated viral hosts, such as vaccinia or fowlpox. This approach involves the use of 
vaccinia virus, e.g., as a vector to express nucleotide sequences that encode prostate cancer 
polypeptides or polypeptide fragments. Upon introduction into a host, the recombinant 
vaccinia virus expresses the immunogenic peptide, and thereby elicits an immune response. 
5 Vaccinia vectors and methods useftil in immunization protocols are described in, e.g., U.S. 
Patent No. 4,722,848. Another vector is BCG (Bacille Cahnette Guerin). BCG vectors are 
described in Stover, et al. (1991) Nature 351 :456-460. A wide variety of other vectors useful 
for therapeutic administration or immuaization, e.g., adeno and adeno-associated virus 
vectors, retroviral vectors. Salmonella typhi vectors, detoxified anthrax toxin vectors, and the 

10 like, will be apparent to those skilled in the art from the description herein (see, e.g., Shata, et 
al. (2000) Mol. Med. Today 6:66-71; Shedlock, et al. (2000) .T. Leuk. Biol. 68:793-806; ffipp, 
et al. (2000) In Vivo 14:571-85). 

Methods for the use of genes as DNA vaccines are well known, and include placing a 
prostate cancer gene or portion of a prostate cancer gene under the control of a regulatable 

1 5 promoter or a tissue-specific promoter for expression in a prostate cancer patient. The 

prostate cancer gene used for DNA vaccines can encode full-length prostate cancer proteins, 
but more preferably encodes portions of the prostate cancer proteins including peptides 
derived from the prostate cancer protein. In one embodiment, a patient is inmiunized with a 
DNA vaccine comprising a plurality of nucleotide sequences derived from a prostate cancer 

20 gene. For example, prostate cancer-associated genes or sequence encoding subfragments of a 
prostate cancer protein are introduced into expression vectors and tested for their 
immunogenicity in the context of Class I MHC and an ability to generate cytotoxic T cell 
responses. This procedure may provide for production of cytotoxic T lymphoc3^e responses 
against cells which present antigen, including intracellular epitopes. 

25 In a preferred embodiment, the DNA vaccines include a gene encoding an adjuvant 

molecule with the DNA vaccine. Such adjuvant molecules include cytokines that increase 
the inmiunogenic response to the prostate cancer polypeptide encoded by the DNA vaccine. 
Additional or alternative adjuvants are available. 

In another preferred embodiment prostate cancer genes find use in generating animal 

30 models of prostate cancer. When the prostate cancer gene identified is repressed or 

diminished in cancer tissue, gene therapy technology, e.g., wherein antisense RNA directed 
to the prostate cancer gene wiU also diminish or repress expression of the gene. Animal 
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models of prostate cancer find use in screening for modulators of a prostate cancer-associated 
sequence or modulators of prostate cancer. Similarly, transgenic animal technology 
including gene knockout technology, e.g., as a result of homologous recombination with an 
appropriate gene targeting vector, will result in the absence or increased expression of the 
5 prostate cancer protein. When desired, tissue-specific expression or knockout of the prostate 
cancer protein may be necessary. 

It is also possible that the prostate cancer protein is overexpressed in prostate cancer. 
As such, transgenic animals can be generated that overexpress the prostate cancer protein. 
Depending on the desired expression level, promoters of various strengths can be employed 
10 to express the transgene. Also, the number of copies of the integrated transgene can be 
determined and compared for a determination of the expression level of the transgene. 
Animals generated by such methods find use as animal models of prostate cancer and are 
additionally usefiil in screening for modulators to treat prostate cancer. 

15 Kits for Use in Diagnostic and/or Prognostic Applications 

For use in diagnostic, research, and therapeutic applications suggested above, kits are 
also provided by the invention. Jn the diagnostic and research applications such kits may 
include one of the following: assay reagents, buffers, prostate cancer-specific nucleic acids or 
antibodies, hybridization probes and/or primers, antisense polynucleotides, silencing RNA, 

20 ribozymes, dominant negative prostate cancer polypeptides or polynucleotides, small 

molecules inhibitors of prostate cancer-associated sequences, etc. A tlierapeutic product may 
include sterile saline or another pharmaceutically acceptable emulsion and suspension base. 

In addition, the kits may include instructional materials containing instructions (i.e., 
protocols) for the practice of the methods of this invention. While the instructional materials 

25 typically comprise written or printed materials they are not limited to such. A medium 

capable of storing such instructions and communicating them to an end user is contemplated 
by this invention. Such media include, but are not limited to electronic storage media (e.g., 
magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), and the like. Such 
media may mclude addresses to internet sites that provide such instructional materials. 

30 The present invention also provides for kits for screening for modulators of prostate 

cancer-associated sequences. Such kits can be prepared firom readily available materials and 
reagents. For example, such kits can comprise one or more of the following materials: a 
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prostate cancer-associated polypeptide or polynucleotide, reaction tubes, and instructions for 
testing prostate cancer-associated activity. Optionally, the kit contains biologically active 
prostate cancer protein. A wide variety of kits and components can be prepared according to 
the present invention, depending upon the intended user of the kit and the particular needs of 
tlie user. Diagnosis would typically involve evaluation of a plurality of genes or products. 
The genes will be selected based on correlations with important parameters in disease which 
may be identified in historical or outcome data. 
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EXAMPLES 

Example 1 : Gene Chip Analyses of Expression Profiles 

Molecular profiles of various normal and cancerous tissues were determined and 
analyzed using gene chips. KNA was isolated and gene chip analysis was performed as 
5 described (Glynne, et al. (2000) Nature 403 -.612-61 6; Zhao, et al. (2000) Genes Dev. 14:98 1- 
993). 

EXAMPLE 2-: Identification of androgen dependent/independent genes 

To identify gene expression changes during the transition from androgen-dependent to 

10 androgen-independent prostate cancer, oligonucleotide microarrays ("K" chips or Affymetrix 
Eos Hu03) were interrogated with cRNAs derived from the human CWR22 prostate cancer 
xenograft model propagated in nude mice (Pretlow, et al. (1993) J. Natl. Cancer Inst. 85:394- 
398). The CWR22 xenograft is androgen-dependent when grown in male Nude mice. 
Androgen-independent sub-lines can be derived by first establishing androgen-dependent 

1 5 tumors in male mice. The mice are then castrated to remove the primary source of growth 
stimulus (androgen), resulting in tumor regression. Within 3-10 months molecular events 
prompt the tumors to relapse and start growing as androgen-independent tmnors. See, e.g., 
Nagabhushan, et al. (1996) Cancer Res. 56:3042-3046; Amler, et al. (2000) Cancer Res. 
60:6134-6141; andBubendor^ et al, (1999) J. Natl. Cancer In^t. 91:1758-1764. 

20 Using the CWR22 xenograft model, tumors were grown subcutaneously in male nude 

mice. Tumors were harvested at different times after castration. The time points post- 
castration included (in days): 0, 1, 3, 4, 5, 10, 30, 40, 50, 51, 52, 59, 60, 61, 70, 79, 80, 82, 
120, and 125. Analyses also included established androgen-independent xenografts. 
Castration resulted in tumor regression. At day 120 and thereafter, the tumors relapsed and 

25 started growing in the absence of androgen. 

cRNAs were generated by ui vitro transcription assays (IVTs) from the different 
samples and were hybridized to the oligonucleotide microarrays (Affymetrix Eos Hu03). 
Hybridization was measured by the average fluorescence intensity (Al), which is directly 
proportional to the expression level of the gene. 

3 0 Two types of analyses were applied to the results: 

Analysis A: 
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The samples were divided into different time groups which included the following 
time points post castration (in days): 1-5, 10, 30-40, 50-82, 120-125. To identify changes in 
gene expression, the following calculations were made: 

1 . The median (or mean, in case there were only 2 samples in a group) was calculated 

5 for each group. 

2. The medians (or means) for each group was compared to one-another. 

3. Genes were selected that exhibited a minimum 2 fold difference in the median (or 
mean) between any of the groups. 

4. The chmge in gene expression over time was analyzed for each selected gene to look 
10 for specific pattern changes. 

Only genes witli an interesting expression pattern during the androgen-ablation time 
course were selected as potential new therapeutic targets and/or diagnostic markers. Among 
the 70,000 gene clusters present on HuOl and Hu02, we identified 820 gene clusters witii the 
desired expression patterns. These expression patterns can be broadly defined into the 
15 following categories: 

1 . Genes that are expressed early in the time course, then drop off in expression, and 
then express again with emergence of androgen-independence (hi-lo-hi pattern in Table lA). 

2. Genes that are expressed early in the time course, then drop off in expression, and do 
not express again with emergence of androgen-independence (hi-lo-lo pattern in Table lA). 

20 3. Genes that are not expressed early in the time course, but express only with 
emergence of androgen-independence (lo-lo-hi pattern in Table lA). 
4. Genes that are not expressed early in the time course, but then express as androgen is 
withdrawn and continue to express with emergence of androgen-independence (lo-hi-hi 
pattern in Table 1 A). 

25 5. Genes that are not expressed early in the time course, but then express as androgen is 
withdrawn and drop off again with emergence of androgen-independence (lo-hi-lo pattern in 
Table lA). 

Group 1 is characterized by cell-cycle regulating genes, such as tliose encoding 
cyclin Bl, p21/WAFl, CDC18-homolog, cyclin A2, cyclinDl, and possible growth factors 
30 such as hAG2 (anterior gradient 2 homolog) among others. This indicates that interruption of 
growth factor and/or cell cycle pathways prevents the emergence of androgen-independent 
disease, making group 1 genes good targets for treating advanced prostate cancer. 
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Group 2 represents genes that are androgen-dependent, and do not re-express due to 
the lack of androgen signal in the androgen-independent phenotype. This group includes 
genes encoding proteins such as Fibronectin 1, which has been previously shown to be down- 
regulated with androgen-withdrawal (Amler, et al. (2000) Cancer Res. 60:6134-6141). 
5 Group 3 represents genes that are up-regulated by signals that induce the androgen- 

independent phenotype. This group includes genes encoding stanniocalcin 2, c-fos proto- 
oncogene product, vascular endothelial growth factor, the cell surface protein transmembrane 
4 superfamily member 1 and adrenomedullin among others. AdrenomeduUin has recently 
been shown to act as an autocrine growth factor for the androgen-independent prostate cancer 

1 0 cell line DU145 (Rocchi, et al. (2001) Cancer Res. 61 : 1196-1206), indicating that its up- 
regulation is critical for supporting an androgen-independent phenotype. Blocking 
adrenomeduUin fimction, and/or other genes in this group, prevents the growth of androgen- 
independent tumor cells. 

Group 4 represents genes that are androgen-repressed and are only expressed in the 

15 absence of androgen. This group includes genes encoding tlie protein tyrosine phosphatase 
interacting protein liprin-alpha 2, the CD24 antigen, and the catalytic subtmit for 
phosphatidylinositol 4-kinase amongst others. Patients that are treated for advanced prostate 
cancer by hormone-ablation may have in their bodies cells that have survived hormone- 
ablation and are likely to up-regulate genes that belong to Group 4. Therefore, Group 4 gene 

20 products are particularly good therapeutic targets for treating patients undergoing hormone- 
ablation therapy. 

Group 5 represents genes that are involved in regulating signals that induce an 
androgen-independent phenotype. This group includes genes encoding Rab2 (a Ras-like G 
protein), the Son of Sevenless homolog (a GTP/GDP exchange fector involved in activating 

25 Ras-like proteins), and the p85 regulatory subunit for phosphoinositide-3-kinase (PI3-kinase). 
The PB-kinase pathway has been implicated in providing a survival signal to the prostate 
cancer cell line LNCaP (Lin; et al. (1999) Cancer Res. 59:2891-2897). This indicates that 
ras-like signals and signals dependent on PI3 -kinase are involved in inducing the androgen- 
independent phenotype. For that reason. Group 5 gene products are particularly good 

3 0 therapeutic targets for treating patients undergoing hormone-ablation therapy. 
Analysis B: 
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For the second analysis, the samples were divided into 4 time groups which included 
the following time points post castration (in days): 0-1, 3-5, 10-82, >120. To identify 
changes in gene expression, the following analysis was performed: 

1 . Genes were selected that exhibited a minimum of 100 AI units at the 90* percentile 
5 expression level of samples. 

2. The group mean expression levels for each gene were calculated. The genes were further 
sub-selected to exhibit a minimum 3 fold difference between the group means. 

3, An analysis of variance was then performed on selected genes. From the original 59,680 
gene clusters present on the Hu03 gene chip, only about 1 165 genes with a P value of < 0.01 

10 were identified that also exhibited the above mentioned parameters. 

4, A method was then employed for calculating the positive false discovery rate (pFDR), i.e., 
an estimate of the proportion of false-positives present in a set of findings (Storey and 
Tibshirani (2001) Technical Report, Department of Statistics, Stanford University, CA ). 
This technique was developed explicitly for use with microarray data. The procedure 

1 5 involves randomly assigning the membership status of each sample to a group and re- 

perforaiing the analysis of variance. In each simulation, the number of group members (6 for 
Group 1, 9 for group 2, 15 for group 3, and 4 for group 4) remained constant, but these 
designations were shuffled and assigned to each sample at random. The permutation was 
performed 1000 times, and for each simulation, the nxunber of findings at P < 0.01 was noted. 

20 The number of false positives under null conditions, was then divided by the number of 
actual findings (n=l 165 genes) to obtain an estimate of the proportion of false positive 
findings. After the appUcation of a correction factor, the final estimate for the pFDR was 
about 1%. Thus, one can expect that approximately 12 of the 1 165 findings are false 
positives. 

25 5. The approximately 1 165 genes were clustered by expression pattern to identify specific 
pattern changes. Only genes with an interesting expression pattern during the Midrogen- 
ablation time course were selected as potential new therapeutic targets and/or diagnostic 
markers. These expression patterns can be broadly defined into the following categories: 
1 . Genes that are expressed early in the time course of androgen withdrawal, then drop off in 

30 expression, and then express again with emergence of androgen-independence (hi-lo-lo-hi 
pattern in Table 2A). 
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2. Genes that are expressed early in the time course, then drop off in expression immediately 
after androgen- withdrawal, and do not express again with emergence of androgen- 
independence (hi-lo-lo-lo pattern in Table 2A). 

3. Genes that are expressed early in the time course, then drop off in expression after several 
5 days of androgen withdrawal, and do not express again with emergence of andro gen- 
independence (hi-hi-lo-lo pattern in Table 2A). 

4. Genes that are not expressed early in the time course, but express only with emergence of 
androgen-independence (lo-lo-lo-hi pattern in Table 2A). 

5. Genes that are not expressed early in the time course, but then express as androgen is 

1 0 withdrawn and continue to express with emergence of androgen-independence (lo-lo-hi-hi 
pattern in Table 2A). 

6. Genes that are not expressed early in the time course, but then express as androgen is 
withdrawn and drop off again with emergence of androgen-independence (lo-lo-hi-lo pattern 
in Table 2A). 

1 5 Group 1 is characterized by cell-cycle regulating genes and cell growth promoting 

genes, such as those encoding cyclin Bl and CDC45 among others, growth factors/hormones 
such as hAG2 (anterior gradient 2 homolog), adrenomedullin, and staimiocalcin 2 among 
others, and growth factor receptors, such as the bone morphogenic protein receptor type IB 
(BMP-RIB) and the endothehal differentiation lysophosphatidic acid G-protein-coupled 

20 receptor 7 among others. AdrenomeduUin has recently been shown to act as an autocrine 
growth factor for the androgen-independent prostate cancer cell line DU145 (Rocchi, et al. 
(2001) Cancer Res. 61:1196-1206), indicating that its up-regulation is critical for supporting 
an androgen-independent phenotype. This indicates that interruption of growth factor and/or 
cell cycle pathways prevents the emergence of androgen-independent disease, making group 

25 1 genes good targets for treating both localized and advanced prostate cancer and related 
conditions. 

Group 2 represents genes that are androgen-dependent, and do not re-express due to 
the lack of androgen signal in the androgen-independent phenotype. This group includes 
genes encoding proteins such as the endothelial protein C receptor (EPCR) and the potassium 
30 intermediate/small conductance calcium- activated channel (subfamily N, member 2). These 
genes represent targets for treating androgen-dependent prostate cancer and related 
conditions. 
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Group 3 also represents genes that are androgen-dependent, and do not re-express due 
to the lack of androgen signal in the androgen-independent phenotype. This group includes 
genes encoding proteins such as Fibronectm 1, which has been previously shown to be down- 
regulated with androgen-withdrawal (Amler, et al. (2000) Cancer Res. 60:6134-6141), and 
5 genes encoding signaling proteins such as Rho GTPase activating protein 1. These genes 
represent targets for treating androgen-dependent prostate cancer and related conditions. 

Group 4 represents genes that are up-regulated by signals that induce and maintain the 
androgen-independent phenotype. This group includes genes encoding potential growth 
promoting proteins such as chemokine-like factor (Unigene ID Hs.l5159), colon cancer- 

10 associated protein Micl, and the mitogen-activated protein kinase-activated protein kinase 2. 
Blocking function of these proteins, and/or other genes in this group, prevents the growth of 
androgen-independent tumor cells and related conditions. 

Group 5 represents genes that are androgen-repressed and are only expressed in the 
absence of androgen or that are induced by the absence of androgen. This group includes 

1 5 genes encodmg transcriptional regulators such as the androgen receptor, the DNA activated 
protein kinase (catalytic subunit), and nuclear fector related to kappa B binding protein 
(NFRKB), among others. Patients that are treated for advanced prostate cancer by hormone- 
ablation may have in their bodies cells that have survived hormone-ablation and are likely to 
up-regulate genes that belong to Group 5. Therefore, Group 5 gene products are particularly 

20 good therapeutic targets for treating patients undergoing hormone-ablation therapy. 

Group 6 represents genes that are involved in regulating signals that are induced 
during androgen withdrawal and that induce an androgen-independent phenotype. This group 
includes genes encoding signaling molecules such as phosphomositide-3-ldnase (class 2, 
alpha polypeptide), signal transducer and activator of transcription 2 (STAT2), phospholipase 

25 A2 (group IIA) and the protein tyrosine phosphatase interacting protein liprin-alpha 2, cell 
surface receptors such as gamma-aminobutyric acid (GABA) A receptor epsilon subunit, G 
proteia-coupled receptor 48, and immune function proteins such as the major 
histocompatibility complex class II DR alpha. The PI3-kinase pathway has been implicated 
in providing a survival signal to the prostate cancer cell line LNCaP (Lin, et al. (1999) Cancer 

30 Res. 59:2891-2897). Tliis indicates that ras-Uke signals and signals dependent on PD-kinase 
are involved in inducing the androgen-independent phenotype. For that reason. Group 6 gene 
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ablation therapy. 
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TABLE 1 A provides Accession numbers for genes, Including expressed sequence tags, (incorporated In Uieir entirety liere and Hirougliout tile epplication wiiere Accession 
numtiefs are provided). Genes witii an inlerest'ng expression pattern during tiie androgen-ablation time course wae selected as potent'al new tiier!|)eutic targets and/or 
diagnostic markers. 820 gene clusters were identBied witii desired expression patterns. These expression patterns can be broadly delined into tlie Wtawing categories: 

1. Genes thai are expressed early In the lime course, Oien drop offin expression, and then express again with emergence of androgen-lndependence (hi-lo-hi pattern), 

2. Genes that are expressed early In the time course, then drop offin expression, and do not express again with emergence of andnogen-independence (hi-lo-lo pattern). 

3. Genes that are not expressed early in the time course, but express only witti ennergence of androgenn'ndependence (lo-lo-hl pattern). 

4. Genes that are nol expressed early in the time course, but then express as androgen is wittrdrawn and contnue to express with emergence of androgen-lndependence (lo-hl- 
hl pattern). 

5. Genes thai are not expressed early in the time course, but then express as androgen is v/ittidrawn and drop off again with emergence of androgen-lndependence (Ic-hi-lo 
pattern). 

Table 1B lists accession numbers for prlmekeys lacking a unlgenelD in table 1A. For each probeset is listed a gene cluster number from which oligonudeotides were designed. 

Gene clusters were compiled using sequences derived from Genbank ESTs and ralWAs. These sequences we ^ -■— -"^ — ,_„_j^ _, j 

Alignment Tools (DoubleTwIst, C" ' ' "-— ■- ■-' 

\. For each predicted exon Is listed genomic sequence source used tbr 



102772 U83115 Hs.161002 

^128610 N48373 Hs.10247 

102276 N4B373 Hs,10247 

100664 AQ3756 

100655 AQ3758 

135400 X78592 Hs.99915 

331363 AW582266 "Hs.91011 

115764 AW582256 "Hs.91011 



101505 
127236 
128472 
102712 
314943 
102123 
326213 
327110 
339186 



UnlgeneTiOe 

absent in melanoma 1 

activated leucocyte cell adhesion molecu 



AW661857 

BE241880 

U77949 

Y00272 

NM_001809 



334899 
334900 
334902 
334905 
334906 
334951 
335044 
335753 
335755 



336721 
105012 
134470 
134750 
125819 



Hs.1139 

"Hs.251871 

Hs.86137 



100589 AW247430 

130666 AI831962 

101473 M22976 

101468 BE538296 
103546 
100829 



Hs.84113 
Hs.84152 
Hs.17409 
Hs.83834 
■Hs,181028 
■Hs,75752 
Hs.278544 
Hs.180015 



ior gradient 2 (Xenepus laevis) horn 
antertor gradient 2 (Xenepus laevis) horn 
baculovirai lAP repeat-containing 5 (sur 
asparagine synthetase 
budding uninhibited by benziraidazoles 1 
cathepsin C 

CDC6 (ceil division cycle 6, S. cerevisi 

cell division cycle 2, G1 to S and G2 to 

centromere protein A (17kD) 

CH.17 hsgl|5867224 

CH.21JS gi|6117642 

CH22 DA59H18.GENSCA.N.72-13 

CH22_EI^:AC000097.GENSCAN.109-2 

CH22_EI«AC000097.GENSCAN.67-4 

CH22^HkAC00a097.GENSCAN£7-6 

CH2?J=GENES.173J 

CH22J=GENES,173J 

CH2?J^GENES.275_1 

CH22_FGENES.275.3 

CH22_FGENES,279.2 

CH22_FGENES.2eO_2 

CH22_FGENES,3_2 

CH22_FGENES.327_59 

CH22_FGENES.397_18 

CH22_FGENES.411J5 

CH22_FGENES.452_13 

CH22_FGENES,462J4 

CH22J^GENES,462_16 

CH22_FGENES,452_20 

CH22_FGENES,462_21 

CH22_FGENE3,466_20 

CH22 FGENES.480 1 



CH22J^GENES.604_4 



■Hs.194698 cyclinBS 



chromosome 20 open reading frame 1 
CDC28 protein kinase 2 
cold shock domain protein A 

CTP synthase 



cycllnDI (PRAD1: parathyroid; _ 
cyclin-dependent kinase inhibitor 3 (CDK 
cystathlonlne-bela-synthase 
cystelne-rlch protein 1 (intestinal) 



hl-lo-hi 
hilo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hl-to-hi 



hi-lo-hi 
hi-lo-hi 
hi-k)-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 



cytochrome c o 
acetyl-Coenzyme A acetyltransferase 2 (a 
□.Jopachrome tautomerase 



98 
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AI815395 
BE250162 
W24087 
BE543205 
AF191019 
AL1 17452 



303274 
301804 
300661 
304641 
304621 



105963 
106286 
106889 
109220 
113158 



115522 
115652 
116121 

116130 
116448 



130680 
131148 
131626 
131937 
131965 
132371 
133626 
300942 
300953 
302656 
311928 
313637 
313832 
316466 
317202 
320771 
321636 



108886 
129241 
104978 
129626 



332577 
116732 
106774 
108818 



T62216 
AK000742 
AW271106 
AW574774 



DKFZP565A0522 protein 
hypothetical protein, eatradic 
DKFZP586G1517 protein 
DMA (cytosine-5-)-methyltransferase 1 
CDC45 (cell division cycle 45, S.oeievis 
hypothetcal protein liJGCI 0334 
hypothetical protein FLJ10549 
anilin (Drosophila Scraps homolog), act 



A1698252 

BE613348 

AW575008 

BE410556 

AI765107 

U46258 

AW958181 



Hs.1 84641 
"Hs.83766 
HS.7S235 

HS.B361 
"Hs.44155 
"Hs.77462 
Hs.1 14311 
Hs.302446 
"Hs.1 21 028 
Hs.62130 
HS.S2180 

Hs.104859 hypothetical protein DKFZp762E1312 
Hs.169476 glyceraldehyde-3-pliosphate dehydragenase 
gb:zx82o11.s1 Soaresovaiy tumor NbHOTH 

■Hs.83766 " 

Hs.83765 
Hs.62180 
HE.62180 
Hs.104741 
Hs.24596 
Hs.105106 
Hs.22971 
Hs.37310 
"Hs.23348 
Hs.1 1355 



Hs.23 



AW362955 
BE327311 
AA687322 
AA412049 
BE536911 
AA464414 
AF217515 
AI375726 
BE667313 
AW953575 



Hs.24641 
■Hs.293380 
Hs.1 33260 
"Hs.1 22579 
Hs,283099 
HS59346 
Hs.47378 
HS.3817B 
Hs.48855 
HS.3817B 
Hs.208912 
Hs.1 5641 
Hs.47166 



PDZ-blnding kinase; T-cell originated pr 
RADSI-interacUng protein 
Homo sapiens cAMP-dependent protein liiia 

ESTs 
ESTs 

S-phase kinase-assodated protein 2 (p45 
thymopoietin 

hypothetical protein STRAIT1 1499 
hypothelksal protein FLJ205Sa 
HSPC145 protein 
ESTs 

cytoskelelon associated protein 2 

ESTs 

hypothetical protein FLJ20354 
hypothetical protein FLJ10461 
AF15q14 protein 
liypothetksal protein FU10614 
— , Moderately similar to T50635 iiypot 

lietlcal protein FU23468 
liypothetical protein FU10468 
iiypotiietical protein FU23468 
tiypotiielcal protein MGC861 
ESTs 



Hs.1 a 

"Hs.303125 
■Hs.289092 
Hs.21446 
Hs.35962 
Hs.46677 
Hs.75277 
Hs.1 22903 
Hs.294088 
Hs.70704 
Hs.270840 
Hs.126774 
Hs.133294 
HS.121S92 
Hs.181181 
Hs.1 171 76 
Hs.193465 
Hs.221197 
Hs.159420 
Hs .286049 
Hs.301539 
Hs.237164 
Hs.103291 
Hs.271252 
Hs.91521 
Hs.1 09706 
Hs.ig322 
Hs.l 11334 
Hs.31097 
Hs.27769 
Hs.165909 
Hs.14587 
Hs,3C3116 



Homo sapiens NUF2R mRNA, complete cds 
gb:2x78g01.s1 Scares ovaiy tumor NbHOT H 

hypothetical protein 

monoamine oxidase A 

p53-induced protein PIGPC1 

Homo sapiens cDNA: FLJ22380 lis, clone H 

Homo sapiens mRNA for K1AA1716 protein, 

ESTs 

PR02000 protein 

hypothetkial protein FLJ1391C 

Homo sapiens, clone 1MAGE:3048353, mRNA. 

ESTs 

Homo sapiens, clone MGE:2823731, mRI^A, 



phosphoserine aminokansferase 
hypothetical protein IUGC2633 
ESTs, Highly similar to LDHH JIUMAN L-LA 

ESTs 

hypothetical protein 

hematological and neurological expressed 

ESTs, Weakly similar to CGHLI7L collagen 

ferritin, light polypeptide 

hypothetical protein FLJ21478 

ESTs, Weakly sirailartoMCAT HUMAN IVIITOC 



hi-lo-lii 
hi-lo-hl 
hi-lo4il 
hi-to-hi 
hi-M 
hHo^i 
hl-lo^i 
hi-lo^i 
hl-lo4ii 
hi-lo4ii 



hHo^ii 
hl-M 
hi-lo^i 
hi-lo^ii 



hi-lo.lil 
hl-lo-lii 
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315618 
132959 



107129 
102696 
101753 
101597 
133512 



130350 
101046 
101544 



100154 
100199 
100372 
100387 

131514 



101396 
119018 
101840 
332640 



101148 
103076 
102212 



AI287341 ■Hs.154029 

AA379597 Hs,5ig9 

AW014195 Hs,61472 

AA351647 Hs.2642 

D79987 Hs,163479 

BE244377 ■Hs.48876 

AF174600 Hs. 

X02761 -Hs.; 

AA452181 Hs.77643 

U71321 Hs.7557 

AC004770 ■Hs.4756 

BE54Q274 Hs.239 

L11144 HS.19C7 

AA317089 ■Hs.597 
L18861 

X14850 Hs.147097 

BE561617 'Hs.119192 

J04088 ■Hs.156346 

AA316181 Hg,6163S 

H15474 Hs,132898 

AK002011 Hs.37558 

BE621719 Hs.173802 

H59799 Hs,42644 

AF151076 Hs.25199 

AFa53305 Ha.98658 

AF053306 Hs,36708 

AW361638 Hs,278338 
U22951 

AL119964 Hs.75616 

NM_014767 Hs.74583 

D21262 H5,76337 

AA369601 Hs,239138 
J05614 



MJ01034 Hs.76319 



NM_014214 Hs.5753 

AU)39104 Hs,159557 

H60720 Hs.81892 

BE562298 Hs.71827 

NHt014791 H6.184339 

D83777 "Hs.75137 

BE270734 'Hs.2795 

W27618 Hs.234489 

BE617695 Hs,286192 

BE300094 "Hs.227751 

BE300094 ■Hs.227751 

AU076611 Hs.154672 

AW067805 Hs.172665 

AI869865 H5.164443 

A1132988 Hs.109052 

BE091926 Hs.16244 

AW500470 Hs,1 17950 

BE242818 "Hs.179606 

AW247262 Hs.76614 

M81740 Hs,76212 

BE299197 Hs.179665 

BE294407 "Hs.99910 

BE206854 Hs.46039 

AF052649 ■Hs.252587 

Ma7399 Hs.44 

X90725 Hs.77597 

AI752235 Hs.41270 

BE267931 'HG.78g96 

AA631143 HS.179B09 

AA236291 Hs.183583 

BE568452 Hs.5101 

BE568452 Hs.5101 

AA371931 ■Hs.77422 

AA219591 Hs.73625 

AC004770 "Hs.4756 

BE614410 Hs.23044 

AA227059 Hs.173737 



bHLH factor Hes4 

HSPC150 protein similar to ubiquitin-con 
ESTs, Weakly similar to unknown 18.cerev 
eukaryotic translation elongation fector 



.193380 F-box protein Fbx20 



FK506-binding protein IB (12,6 kD) 

FK506-bindingproyn5 

flap struolLre-speoilk! endonuolease 1 

forkiieadboxMI 

■ • J 

iGolli-mbpgene.exi 
l^2Ahistone family, memberX 
H2Ahislonefanily, member Z 
lopoisomerase (DNA) il alplia (ITOkD) 
six transmembrane epitbellal antigen of 
fatty acid desaturasel 
hypoliielical protein aJ11149 
KIAA0603 gene product 
thioredoxin-like 
hypothetical protein 

budding uninhibited by benzimidazoles 1 
budding uninhibited by benzimidazoles 1 
LGI^ protein 

gb:Human mRNA clone with similarity to L 

KIAA0275 gene product 
nucleolar phosphoprotein p130 
pre-B-cell colony-enhancing factor 
gbiHuman proliferating cell nuclear anti 
gb:Human propionyl-CoA carboxylase beta- 
inositol(myo)-1 (or 4)-nionophosplialase 2 
karyopherin alpiia 2 (I^G coliort 1, impor 



KIAA01 12 protein; inmolog of yeast ribos 
K1AA0175 gene product 
KIAA0193 gene product 
lactate deiiydrogenase A 
lactate dehydrogenase B 
protein phosphatase 1, regulatory (inhib 
lectin, galactoside-binding, soluble, 1 
■ , galaotoside-bindirg, soluble, 1 



en reading frame 2 
mitotic spindle coiled-coil related prot 
multifunctional polypeptide similar to S 
nuclear RNA helicase. DECDvarianI of I3E 
nucleoside phosphorylase 
ornithine decarboxylase 1 
cyclin-dependent kinase inliibitor 1 A (p2 
phosphofructokinase, platelet 
phosphoglycerati ' - ■ ■ ■ 



pleiotrophin (heparin binding gtowtli fac 
polo (Drosophial-iike kinase 
procollagen-lysine, 2-oxoglutataie 5-dia 



AW411491 Hs.75069 

AW411425 H3.180655 

BE018138 Hs.24447 

BE259035 Hs.1 18400 

BE250944 Hs,183556 

AAi12748 Hs.279905 

Z49105 '■HS.2B9105 



ESTs 

serine [or cysteine) proteinase inhibik) 
protein regulator of cytokinesis 1 
protein regulator of cytokinesis 1 
proieolipid protein 2 (cotonlcepilliellu 
RAB6 interacting, Mnesin-like (rabklnes 
flap structure-specific endonuclease 1 
RAD61 (S. cerevisiae) homolog (E coli Re 
ras-related C3 botulinum toxin substrate 
regulator of G-protein signalling 2, 24k 
replioation protein A3 (14kD) 
ribonucleotide reductase M2 polypeptide 
SlOOcalcium-bindlngprc' ' " 

serine/threonine kinase 12 
Sigma receptor (SR31747 binding protein 
singed (Drosophil^-like (sea urdiin fas 
solute canier family 1 (neutral amino a 
clone HQ0310PRO0310p1 



hi-lo-hi 
lii-lo-hi 
hi-lo-hi 



hi-lo-hi 
hi-lo-hi 
hWo-lii 
hi-to-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hMo-hl 
hl-lo-hi 
hi-lo-hi 



hi-lo-W 
hi-k)-hi 
hi-lo-hi 
hi-lo-hi 



hi-lo-hi 
hl-lo-hl 
hi-lo4ii 
hi-lo-hi 



hl-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 



hi-to-W 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
hi-lo-hi 
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20 
25 















U66618 










AF230662 










AA622037 




programmed cell death 5 






X02308 


Hs.82962 


thymidylate synthetase 






BE264974 


Hs.6566 


th/roid hormone receptor interactor 13 




131877 


J04088 


"Hs.1 56346 


topolsoiTietase (DN/V) II alpha (170kD) 


hi-io-hi 


100866 


U14134 


Hs.75113 


general t'anscrlptlan faitor IIIA 


hi-k)4il 




AI434699 




transfenln receptor (p90, CD71} 






AA311426 












Hs.1 54036 


tumor suppressing subtransferable candid 
















NM_006002 










NM_007019 








103556 


Z19002 


Hs.37098 


zinc linger protein 145 (KruppeHike, e 


hl-lo^i 












133015 


AJ002744 


Hs.246315 


UDP-N-acetyI-alpha^alaclosamine:palyp 


hi-lo-hl- 




NM_001360 




7-dehydiDcholesleiDl leductase 






AF207664 




a disintegiin-like and matalloprotease'l 




300023 


AV560804 


Hs.301417 


AHNAK nucleoprotein (desmo/okln) 






U80899 


"Hs.301417 


AHNAK nucleoprotein (desmo/okln) 
















AW162057 


Hs.78629 


ATPase, Na*K* transporting, lieta 1 poly 




318538 


A1750979 




Homo sapiens ctone 24651 mRNA sequence 






A1878825 


Hs.323459 


caveolin 1. caveolae protein, 22kD 










CHJ(_hs gil5868838 
CH22„FGENES.369_12 






























Cn2<_FGENES.604_5 






AW475081 


HS.172928 


collagen, type 1, alpha 1 






AU077196 


Hs.B2985 








BE387561 




DKFZPdBdmI 523 protein 






AL)077333 


Hs.1d0483 








AU077333 


"Hs.160483 










"Hs.306201 


hypothetical protein DKFZp58401278 






H84730 


Hs .326391 


ESTs, HigNy similar to KIAA1437 protein 






AB037858 




hypothetical protein FLJ10337 




304049 






gb:yb98h03.s1 Stratagene lung (937210) H 




304735 


AA576453 




gb:nm75h11.s1 NCI_CGAP_Co9 Homo sapiens 




306999 


Al 138628 


Hs-308058 


EST, Weakly similar ta zinc finger prol 






AW368576 










AB037858 




hypothetical protein FLJ10337 






AB037858 










AK001691 










AA328102 










AL047586 


Hs.10283 


RNA binding motif protein 8B 






AI188161 










AA766605 


"Hs.47099 


hypothetKai protein FU21212 






AL1 09729 




ESTs, Highly similar to A31026 probable 






BE1 59395 


Hs.87089 








AW134519 


Hs.96125 








AA446628 




cartilage linking protein 1 






A1637471 


HS.107B01 






128515 


BE395085 




type 1 transmembrane protein Fn14 








Hs.1 80059 


Homo sapiens cDI^IA FU20653 fls, ctone KA 






AA749230 


Hs.22666 








NU_017413 


Hs.303084 


apelin; peptide ligand for APJ receptor 






AA348031 








300258 


AI478933 


Hs.188260 








H94900 












Hs.133159 


ESTs, Weakly similar to PIHUSD salivary 






AW450461 


Hs.203965 








AI284219 


Hs.130749 








AA679430 










AI735759 


Hs.52520 






322826 


AI807883 


Hs.201771 


ESTs 


hi-lo-lo 


324867 


AI624707 


"Hs.5921 


Homo sapiens cDI^ FLJ21592 Ss, cbne C 


hi-lo-lo 


331336 


AA287450 


Hs.93842 


Homo sapiens cDNA: FLJ22554 fis, clone 


hi-lo-lo 


331353 


AA953006 


Hs.88143 


ESTs 


hi-lo-lo 


133063 


AI654133 


Hs.30212 


thyroid receptor interacting protein 15 


hi-lo-lo 


311034 


BB67130 


Hs.311389 


ESTs, Moderately simitar to Pr0375 natur 


hi-lo-lo 


108647 




Hs.44276 


homeoboxCIO 


hi-lo-lo 


124965 




"Hs.324841 


hypothetical protein FU 22622 


hl-lo-lo 


113923 


AW9534a4 


Hs.3849 


hypothetical protein FU22041 similar to 


hi-lo-lo 


310557 


AI431798 


Hs.164192 


ESTs, Weakly similar to Y161_HUMAN HYPOT 


hi-lo-lo 


302943 


AI5ei344 


Hs.127812 


ESTs, Weakly similarto T17330 hypotheti 


hi-lo-lo 


128453 


X02761 


■Hs.287820 


fibronectin 1 


hl-lo-lo 


305232 


AA670052 


Hs.16g476 


glyceraMehyde^ptnsphate dehydrogenase 


hi-lo-lo 
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NM 005756 


Hs.184942 






133666 


U56725 


Hs75452 


healstiock70kD protein 2 


hi-io-io 














S69027 




























AF052107 


Hs. 90797 








W05150 


"Hs.37034 


Homo sapiens raRNA; cDNA DKFZp564H1916 (f 




302290 




Hs.175563 
Hs.8065d 


Homo sapiens mRNA; cDNA DKFZp564N0763 (f 










KIAAOOSB^proteln " ^ 






NM-002206 




inlegrin, alpha 7 






AI351642 














KIAA01 93 gene product 






BE265848 


Hs.2B9080 


















B^B6801 










AF001691 








133050 


X73424 


Hs,63788 


pro^onyt Coenzyme A caiinxylasB, beta p 


hi-lo-k) 




AI186<I31 












Hs,2C166 








AA67B403 












'Hs.105314 


relaxin 1 (HI) 






BE1 84455 


"Hs 251754 






103240 


U81961 


Hs.2794 


sodium ciiannel, nonvoltage-gated 1 alpiia 


hl-lo-k) 




AA366037 










AI674383 










AA283809 


Hs. 184601 








































Hs.21858 








AW301993 


Hs. 73980 


troponin Tl. slceletal, slow 








'Hs.10319 


UDP giycosyitranslerase 2 family, polype 










vasoactive intastinai peptide receptor 1 








Hs3C0816 ' 


viilln 2 (ezrin) 

Honfio sapiens mRNA; cDNA DKFZp564l172 (Ir 




132618 


AL050025 


'Hs.279916 


hypothetical protein FU20161 


io-hi-hi 






'Hs.78183 








AL038450 


Hs.48948 








NM_013230 


Hs,286124 


















AW602166 




CEGP1 protein 






AA557550 






























BE379594 




ESTs, Moderately similar lo AUU7_HUMAN A 










hypothelical protein FLJ10890 










gb:QV1-HT0573-29020O^)92-IO6 HT0573 Homo 

KIAA1494 protein 




104189 


AB040927 


"Hs,30098 
Hs,301804 


io-hi-hl 














AA834664 






















Hs. 273294 


hypothetical protein FU20069 








■Hs!ffl3683 


Apobec-1 complementation factoid APOBEC- 




113803 


AW880709 




chromosome 8 open reading frame 4 


io-hi-H 




AA601038 














Homo sapiens cDNA: FLJ23241 fis, cbne C 


















Hs 295971 




















Hs 173830 
















131524 


AB040927 


Hs,301804 


KIAA1494 protein 


io^i-hl 


132116 


AW960474 


Hs.40289 


ESTs 


Io-hi-hi 


132442 


AW970859 


Hs.313503 


ESTs 


io-hi-hi 


310219 
310598 


AI221087 
AI439136 


Hs,147761 
Hs,140546 


ESTs 
ESTs 


io-hl-N 
Io-hi-hi 


310884 




Hs.232189 


ESTs 


io-hi-hi 


311587 




Hs.271019 


ESTs, Weakly similar to SMN1_HUIIMN SURVi 


Io-hi-hi 


312240 


R3647T 


Hs.24321 


Homo sapiens cDNA FU12028 lis, done HE 


Io-hi-hi 


312803 


AA677934 


Hs.1 17864 


ESTs 


io-hi-H 




AA262331 


Hs,48376 


Homo sapiens done HB-2 mRNA sequence 


lo-hi-hi 


315052 


AA876910 


Hs.134427 


ESTs 


Io-hi-hl 


331919 


AA446869 


Hs.119316 


ESTs 


io-hi-hi 


133240 


AK001489 


Hs.242894 


ADP-ribosylationfector-llkel 


lo-hiJii 
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134006 
124847 
129087 
131762 
129000 



302357 
113231 
111923 



315368 
115084 



AI348027 
AA744902 
AA744902 
A1122843 



NM_00M71 

X03178 

AA278583 



AI339046 
ABOSTy-B 
AW068579 
BE383668 

AA373314 



Hs.7837 

"Hs.304177 

Hs.108557 

"Hs.107757 

"Hs.107767 

■Hs.184319 

Hs.48994 



Hs.1 06534 

"Hs.167017 

Hs.198246 

Hs.180737 

Hs.25925 

Hs.1 83476 

Hs.107637 

Hs.104696 

Hs.7780 

"Hs.42484 



AF188747 
AA535210 
AW949068 



Hs.1 71 995 
Hs.125511 
HS.204C96 

AA021459 Hs.306480 

NM_002436 Hs.1 861 

AF127577 Hs.155017 

AB001914 Hs.170414 

U818C2 Hs.154846 

AW379130 Hs.1 8953 

N98569 Hs.76422 

NM_005026 Hs.78589 

AF034799 Hs.3C881 

AA021459 Hs.306480 

U83g93 Hs.321709 

U97276 Hs.77266 

BE244053 Hs.79362 

X80821 Hs.302177 

AU077115 Hs.201675 

NM_006379 Hs.171921 

W26406 Hs.295923 



134921 AL137491 



130760 

134032 
303762 
110932 
135192 
133886 
134142 
100877 



127435 
110520 
114660 



105402 
102976 
101793 
129890 
328164 
328648 
330032 



337603 
338561 
338562 
333743 



AA071383 

NM_002038 

AA506324 

NM_000481 

AA535210 

AU076801 

X6362g 

AW1 76909 

AB014680 

AU077174 

W01076 

AI868872 



Hs.265827 

Hs.1852 

Hs,102 



Hs.42346 
Hs.8786 
"HS.2B8181 



G-protein-caupled receptor induced prate 
Homo sapiens clone FIJ8503 PR02286 mRI^ 
Homo sapiens clone PP1057 unknown mRNA 
h/pothetical protein PR01489 
hypothetical protein PR0148g 
ESTs,Weal(ly similar to KiAAIOOO protein 
gb;za46c11.s1 Scares fetal liver spleen 
ESTs, Weakly similar to AF1 51 300 1 CGI-4 
gb:zo20f10,s1 Slralagene colon (937204) 
ESTs, Weakly similar to UCAT_HUMAN li4IT0C 
hypothetical protein FU22625 
gamna-aniinobutyiic acid (GABA) B rocepto 



Homo sapiens cbne 231 
Homo sapiens clone 23860 mRNA sequence 
Homo sapiens done 25061 mRNA sequence 
hypothetical protein FU12806 
K1AA1324 protein 

Homo sapiens mRNA; oDNA DKFZp564A072 (fr 
hypothetical protein FLI10618 
Homo sapiens mRNA; cDNA DKFZp586P1622 (f 
KIAA0493 protein 

insulin-like growth factor binding prote 

kallikrein 2, prostatic 

kallikrein 3, (prostate specifc antigen 

kallikrein 3, (prostate specific antigen 

Homo sapiens mRNA; cDNA DKFZp434P1530 (( 

lipophilin B (uteroglobin family meraba) 

Homo sapiens mRNA; cDNA DKFZp761E2112(f 

membrane protein, palmitoylated 1 (55kD) 

nuclear receptor Interacting protein 1 

paired basic amino acid cleaving system 

phosphatidylinositol 4-kinase, catalytic 



phospholipase A2, group IIA (platelet?, 
serine (or cysteine) proteinase InNbilo 
protein tyrosine phosphatase, receptor t 
Homo sapiens mRNA; cDNA DKFZp761E21 12 (f 
putlnetgic receptor P2X, llgand-gated lo 
quiesdn 06 

retinoblastoma-Iike2(p130) 
H.sapicns mRNA for ribosomal protein L18 
RNA binding motif protein 5 
sema domain, immunogioby'in domain (Ig), 
seven in absentia (Drosophila) nomolog 1 
sialyltransferase 1 (beta-galacfoside si 
TAR (HIV) RNA-bincing protein 1 
Homo sapiens cDNA FLJ13613 fis, done PL 
lectin, gaiaclos'de-binding, soluble, 8 
gb:zm61dG5.r1 Stratagene fibroblast (937 
interferon, aipha-induciblc protein (do 
acid pliospliatase, prostate 
aminomettiyltransferase (glycine cleavage 
kallikrein 3, (prostate specific antigen 
cadlienn 17, Llcadherin (llver-intestin 
cadtterin 3, type 1, P-cadtierin (placenta 
caldneurin-binding protein calsarcin-1 
carbohydrate (chondroitin 6/keratan) sul 
cathepsin H 

CD59 antigen p18-20 (antigen Idenlilied 

Homo sapiens cDMA: FLJ22704 lis, clone H 

CH.06LhS 9115868068 

CH.07Jsgl|6004473 

CH.16_p2gl|6682596 

CH.16_p2gl|6682596 

CH.20Jsgl|6552458 

CH22_C20H12.GENSCAN.16-2 

CH22_EM:AC005500.GENSCAN.421-5 

GH22_EMAC005500.GENSCAN.421-6 

CH22_FGENES.264_1 

GH22_FGENES,2gO_3 

CH22_FGENES,290_8 

CH22_FGENES,36C 1 

CH22_FGENES,360_3 

CH22_FGENES.406J 

GH22_FGENES.41-1 

GH22_FGENES.46-1 

CH22_FGENES.527_2 

CH22_FGENES.527_3 



lo-hi-hl 
lo-hi-hl 
b-hi-hl 



lo-hi-hl 
lo-hi-hl 
lo-hl-hl 
b-hl-hl 
lo-hl-hl 
lo-hf-hi 
b-hl-hl 
b-hi-hl 
b-hl-hi 
b-hl-hi 
b-hl-hl 
b-hi-hl 



lo-hl-hl 
lo-hl-hl 
lo-hl-hl 
iD-hl-hl 
iD-hl-hi 
lo-hl-hl 
b-hl-hl 



lo-hl-b 
lo-hi-b 



b-hi-b 
to-hWo 
b-hl-b 
b-hMo 
ta4il-ta 



lo-hMo 
lo-hi-to 
lohi-to 
lo-hi-b 
lo-hi-b 
b-hi-b 
b-hi-b 
b-hl-b 
b-hl-b 
b-hl-b 
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30 
35 
40 
45 
50 
55 









CH22_FGENES.617_7 










CH22_FGENES.683_3 










CH22_FGENES.81_8 










chromosome 21 open reading frame 5 




130380 


AI949359 


Hs!l43600 


type II Golgi nnembrane protein 


k>-hi-lo 


102962 


R50032 


Hs.159263 


collagen, lype VI, alpha 2 


lo-hi-lo 












AA448090 




ESTs, Highly similar to RB18 MOUSE RAS-R 
















Al523875 




gb:tg97d04.x1 NC1.CGAP„CLL1 Homo sapiens 






AA495930 




Homo sapiens c[3MA: FLJ221 65 lis, dona H 






AA010200 










AA525775 


Hs.292523 










Hs.221612 








AI681545 


Hs,152982 


hypcthelical protein FIJ13117 








Hs. 107515 








AB033100 


H3,300646 


KIAA protein (similar to mouse paladin} 




321696 






amplified In osteosarcoma 
gb:yiJ66f10.r1 Weizmann Olfactory Epithel 






AI242754 


Hs. 137306 


















AK002088 




Homo sapiens cDNA FLj 11 226 fis, done PL 










Homo sapiens cDNA: FLJ21 904 fis, clone H 






AA301270 




gb;EST14192 Testis tumor Homo sapiens cD 






AK002161 




yeast Sec31 p tioraoiog 




323835 


AL042005 








323926 


AA354572 




gb:EST62857 Jurlcat T-celis V Homo sapien 








Hs. 271 340 








AA884756 




gb;am20a10.s1 Soares_NFLJ_GBC_S1 Homos 






AA612626 




Homo sapiens cDI^ FLJ13752 lis, done PL 






AA(J75481 




fanib'n, light polypeptide 








HS.Z74323 


Homo sapiens. Similar to sialyltransfera 






AA325633 


Hs.136102 


KIAA0853 protein 








HsJ12B79 








Ml 15962 


Hs.323423 


ESTs, Moderately similar to B Chain B, 






AA0B2000 




gb:zn26f07.r1 Stratagene neuroepllliBiiim 








HS.Z72572 


hemoglobin, alpha 2 






AA062837 




gb:zm05b1 1 .si Stratagene corneal stroma 


























AA759177 


Hs.298148 


ESTs, Weakly similar to KIAA0565 prolel 






AK000142 


Hs. 101774 


hypothetical protein FLJ23045 






AW080585 




gb;xc33f09.)c1 NCLCGAP_Co1B Homo sapiens 






AI239923 


Hs.30098 












Homo sapiens clone 23860 mRI^ sequence 










Homo sapiens cDNA FLJ1 2935 fis, clone NT 






C04863 




















AA609684 




Homo sapiens cDNA; FLJ21 543 lis, clone C 




















Homo sapiens done 23860 mRNA sequence 






A1264847 




Homo sapiens cDNA FLJ12935 f s, clone NT 




























Hs.1 12748 


Homo sapiens cDNA: FLJ21543 lis, done C 






AW975028 


Hs.1 02754 


















AA807881 










AW296584 


Hs.293782 








AL1 37281 




Homo sapiens mRNA; cDNA DKFZp434C201 6 (f 






AW385224 


Hs.35198 


ectonudeotide pyrophosphatase/phosphodi 








Hs.301526 


hypothetical protein FLJ13181 






BE242691 




ESTs, Weakly similar to ALUI.HUMAN ALU S 




112098 




Hs.1 03795 


Homo sapiens cDNA FLJ13136 Us, clone NT 






BE246743 


Hs.288529 


hypothetical protein FU22635 




11 2902 


AL035633 


"Hs.1 291 90 


Human DNA sequence ftom cbne RP5-1046G1 






AW024162 










BE379794 


Hs.65403 


hypothetical protein 




118739 


HO 1463 


Hs.93534 


ESTs 


Mi-\o 


119267 


AA064970 


Hs.1 18145 


ESTs 


lo4il-to 


120570 


AA280679 




ESTs, Weakly similar to ALU1_HUMAN ALU 


lo4il-lo 


121176 


AL121523 


Hs]97774 


ESTs 


lo^Mo 


123360 


AA532718 


Hs.178604 


ESTs 


lo4ii-lo 


123974 


NM.015578 




neurobeachin 


lo4ii-lo 


124777 


R41933 




gb:yg04f09.s1 Scares infant brain 1NIB H 


M-lo 


128046 


AA873285 




gb;oh68h05.s1 NCI_CGAP_Kld5 Homo sapiens 


loJiHo 


128666 


AA808466 


Hs.1 03395 


hypolhetk^ protein FU14146 

ESTs 


lo^i-lo 


130639 


AI557212 


"Hs.17132 


Mi-lo 


13C693 


R68537 


Hs.17962 


ESTs 


lo4ii-lo 


131756 


AA443966 


Hs.31595 


ESTs 


M-lo 


131985 


AA50a020 


Hs.36563 


hypothetical protein FU22418 


M-Io 



wo 02/098358 



PCT/LS02/17594 















BE326276 


"Hs.8861 








AA555209 


Hs.259439 








AW291411 




ESTs , Weakly similar to S00754 zinc fing 




302595 


AI699372 


Hs.1 93247 


Homo sapiens mRNA; cDNA DKFZp434A171 (Ir 




303132 


AI929819 










AA340605 


Hs.1 05887 


ESTs, Weakiy simitar to Horrwlog of rat Z 






BE246743 


Hs.288529 


iiypothetlcal protein FU22635 




310C26 


AA278233 


Hs.100691 








AI253072 








310353 


AI261700 










AI262584 


Hs.145575 








AI670843 


Hs. 200257 








AW022192 


Hs.200197 








Al 277603 


Hs.145990 








AW262580 




KIAA1621 protein 








Hs!209115 










Hs.101316 




lo-h)-io 






Hs .2061 32 








AA682393 


"Hs.1 19237 










Hs.302251 










"Hs,127453 








AW450103 










AW293341 




ESTs, Weal<iy simiiar to 138022 hypottieli 






AW970985 


Hs,290853 








AI248774 


Hs.1 26707 


tiypottiefioal protein FU11457 






AA699325 










AI676164 


Hs.204339 








AI801098 


Hs.151500 








AA9 27670 










AW960454 


Hs.222830 






313689 


AI608810 


Hs.193288 








AI827237 


Hs.282884 








AI280112 


Hs.1 25232 


Homo sepiens cDNA FLJ13266 fs, done OV 






AI867931 


Hs.1 64595 








AA602917 














ESTs, Moderately similar to ALU5_HUMAN A 
















AA806538 




KIAA1575 protein 




315074 


M828284 


Hs]l36729 


Homo sapiens cDNA: FU21348 lis, clone C 


kj-hl-to 


315214 


AI915927 


Hs.34771 


ESTs 


lo-iiHo 




AW292176 












Vb!279610 


hypothetical protein FLJ10493 








Hs.293696 










Hs.1 84780 








AA292998 










AW515373 


Hs.271249 


Homo sapiens cDNA FLJ13580 fls, clone PL 






AW136397 










AI469960 


Hs.1 70698 






316244 


AI640761 




ESTs 






AW139408 










BE540C9C 
AA889055 


Hs.1 221 56 








AI660898 










AW138241 


Hs.210846 


















AIB09444 


Hs.202108 








AI806867 










AW071851 


Hs.130628 








AI565(y71 


Hs.1 59983 








AI986208 


Hs!244760 












gb:seq3329 1-NIB Homo sapiens cDNA done 








Hs.43838 








AH 6001 5 












Hs.269109 








AWffi^63 


Hs.246240 








AW294316 










AW972832 


Hs.29468 








AA101697 


Hs.211270 






323045 




Hs!l 88836 


ESTs 


io-i)l-lo 


323091 


AI902456' 


Hs.210761 


ESTs 


lo-hi-io 


323262 


AL1 33990 


Hs.1 90642 


ESTs 




323410 


AW1 18683 


Hs.1 541 50 


ESTs 




323645 


AW445014 


Hs.197746 


ESTs 




324593 


AW972227 


Hs.1 63986 


Homo sapiens cDNA; FiJ22765 lis, clone K 




324656 


T78413 


Hs.293696 


ESTs 




324674 




Hs,1 15831 


ESTs 


lo-hl-lo 


324713 


AI093930 


"Hs.313466 


ESTs 




324790 


AI334367 


Hs.1 59337 


ESTs 


!o-iiHo 


324804 


AI692552 




gb:wd73f12.x1 NCLCGAP_Lu24 Homo sapiens 


lo-iiHo 


330728 


A1905520 


Hs.29672 


ESTs 


lo-hMo 


330760 


H04588 


Hs.30469 


ESTs 


lo-iiHo 
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331028 
331046 
331050 
331053 
331180 
331313 
331337 



332265 
3323U 
131517 
315352 
315498 
321489 
106099 
105726 



331492 
110837 
330814 
312226 
102034 
134671 
131083 
309575 



133731 
303297 
108732 



104933 
302235 
320574 
324678 
331022 
332430 
330601 
101988 



311251 
314171 
106096 
133740 
119521 
119546 
119559 



N94126 
AL049987 
AL049443 



ESTs, Moderately similar to AUJIJHUMAN A 
Human DMA sequence from PAC 75N13 on chr 
iiypotheUcal protein 



AI459177 

NM_012068 

NM_012C68 

AI820719 

AI673735 

AI741606 

AW972771 

BE541042 

AL359588 

AI889208 

AA315993 

AK001114 



AW1 68096 

D86962 

NM_005518 

N77976 

N71725 




AI13074C Hs.6241 
AI655662 Hs.197698 
A1821895 Hs.193481 
AW379378 Hs.170121 
S.170160 



H3hislone,famny3B(H3.3B) 



ESTs 
ESTs 

hypollietical protein FLJ23045 

sema domain, transmembrane domain (TM), 

ESTs, Moderately similar to ALU1 HUMAN A 

ESTs, Moderately similar to ALU1_HUMAN A 

ESTs, Moderately similar to ALU7_HUMAN A 

activating transcription factor 5 

activating transcription factor 5 

DnaJ (Hsp40) homolog, subfamily A, membe 

ESTs, Weai<ly similar to ALU 1_HUMAN ALU S 

ESTs, Weakly similar to ALU1_HUMAN ALU S 

ESTs, Weakly similar to ALULHUMAN ALU S 

Homo sapiens cDNA nj13496 lis, clone PL 

liypothetical protein DKFZp762B226 

hypothetical protein FLJ10890 

Homo sapiens regenerating gene type IV m 

hypothetical protein FU10252 

HT01B protein 

ESTs, Weakly similar to ttansfbtmab'on-r 
ESTs 

fibromodulln 

FK506-binding protein 9 (63 kD) 
gamma-aminobutyric add (GADA) A recopto 
giyceraidehyde-3-phospti ate dehydrogenase 
growth factor receptor-bound protein 10 
3-hydroxy-3-methylglutaryl-Coenzynie A sy 
hemoglobin, alpha 1 
hemoglobin, aipiia 2 

Homo sapiens clone 24468 mRNA sequence 
ATP syntliase, H+ transporting, mitocliond 
ATP syntliase, H* transporting, mitoctiond 
ATPase, aminophospholipid transporter (A 
Homo sapiens mRNA from chromosome 5q21-2 
hypothetical protein 

Homo sapiens mRNA; cDNA DKFZp564F1 1 2 (fr 
Homo sapiens mRNA; cDMA DKFZp586N2020 (f 
ORF 

HT01 8 protein 

hypothetical protein FU22489 
Homo sapiens cDNA: FU2193D lis, clone H 
hematopoietic PBX-inleiactIng protein 
major histocompatibility complex, class 



A/>d55986 Hs.232068 
M31669 Hs,1735 
NM-014785 Hs.47313 



KiAA0711 gane product 
KiAA0884 protein 
KiAA1025 protein 
KiAA1051 protein 
KiAA1105 protein 

Homo sapiens LUCA-15 protein mRNA, spile 
degenerative spermatocyte (homolog Droso 

transaldolase 1 

peroxisome proiiferafve activated tecep 
phosphoinosltide-3*lnase, regulatory su 




lo-hl-to 
lo-hMo 
lo-hNo 
b-hi-lo 
lo-hHo 
lo-hi-lo 
lo-hMo 
lo-hl-lo 
lo-hl-Io 



lo-hl-lo 
lo-hi-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 



lo-hl-lo 
lo-hi-lo 



lo-hMo 
lo-hHo 
lo-hHo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 
lo-hl-lo 



lo-hHo 
lo-hHo 
lo-hl-lo 
lo-hl-lo 

lo-hi-lo 
lo-hi-lo 
lo-hl-lo 
lo-hi-lo 
lo-hi-lo 
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Ml 78955 








300566 


R34926 


Hs .326392 










Hs.108447 


















AA037534 










AI750878 


Hs.87409 


















N22401 










M0591 55 




gb;zm10f1 1.s1 Stratagene pancreas (93720 
















AA641695 




gb:nr62h10.s1 NCI CGAP LymS Homo'sapiens 






AA071383 




gbam61d05.r1 Stratagenelibniblast (937 






AA069820 


Hs.1 80909 










Hs!323423 


^STs, Moderately similar to B Chain B, 






AA075424 


Hs.325505 


ESTs, Moderately sMIar lo HBAJHUMN HE 






AA075601 










AA079347 




gbzm96o06,s1 StrSene colon HT29 (937 






AA079409 




gb:zm96h02.s1 Stratagene colon HT29 (937 




















gb;zn07li10.r1 Strat^enebNT neuron (937 
















AA467736 


Hs.275437 










"Hs. 182183 


























338038 






CH22IeM:ACC05500.GENSCAN.149-9 


















Hs .80706 


diaptiorase (NADHyNADPH) (cytochiDme b-5 


io-hi-lo-hi 








gb:/g87b07.s1 Scares Inlant brain 1NIB H 


























AW023595 


Hs.232043 


















AI740792 


Hs. 167531 






















growth arrest and DNA^anage-induclble, 






NhL012445 


'Hs, 2881 26 






























Hs.87539 


aldehyde dehydrogenase 8 










BCL2/adenovirus E1B 19k[>-lnte(a;tlng pro 






AU076820 










AU076743 














CH.05 hs gl|5867968 






AA262294 










NM_001975 


"Hs.146580 




b-lo-hi 




AA194776 




















AK000742 


Hs!l 25774 


L2DTL protein 


lo4o-hi 




AA896986 




gb:al06a08.s1 Baistead spleen HPLRB2 Horn 






























hypothetical protein FU22316 












































































AA305599 




hypothetical protein PRO2013 


















HS.169B98 






321024 


AW246216 




Homo sapiens C1orf19 mRF^, partial cds 
















AA306997 


Hs!2683S2 


ESTs, Weal^ly similar to hypothetical pro 
















AA334511 


Hs.26638 


ESTs, Weal<ly similar to unnamed protain 






A1580127 








128896 


T53925 


Hs!l07^^ 


fibrincgen-iike 1 


lo-io-hi 




AV552066 


Hs.75113 


general transcription factor illA 


lo-io-hi 


103245 


BE566343 


'Hs.28988 


giutaredoxin (thioitransferase) 


io-Mi 


314785 


AI538226 


Hs.32975 


guanine nucleotide binding protein 4 




103677 


Z83806 




gb:H.sapiens mRNAforaxonemaldynein he 


k)-lo^li 


131170 


MM 014253 


■Hs.23796 


odz (odd OzAeir-m, DrosophUa) homolog 1 


lo-loJii 


131164 


AW013807 


Hs.182265 


i(eratln 19 


lo-io-hi 


100409 


D86957 


Hs.80712 


KIAA0202 protein 


lo-lo-hi 


133167 


AW162840 


Hs.6641 


kinesin family member 5C 


ki-lo-hi 


319080 


AW967646 


Hs.23023 


ESTs 


ki-lo-hi 


330706 


AF097994 


Hs.301528 


L-kynurenlne/aipha-amlnoadipate aminotra 


k>-lo-hl 


104052 


NM.C02407 


Hs.97644 




lo-Io-hl 


100547 


M57417 




gbdHomo sapiens mucin (mucin) mRNA, part 


lo-io-hl 
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myosin-binding protein C, slow-lype 
novel Ras family protein 
protein L7 

protease inhibitor, Kazal lype 1 
ssociated polypepi"- " 
Ste-20relatedl<inase 




vascular endothelial growth factor 
v-f05 FBJ murine osteosarcoma viral onco 
v-fbs FBJ murine osteosarcoma viral onco 
FU22316 

Homo sapieris cDNA: FU21409 lis, clone C 
gb:zn30d02,s1 SIratagene neuroepllhelium 
gb:zn04d0ar1 Stratagene hNT neuron (937 
gb:zn29d08.r1 St 



lo-lo-hl 
lo-lo-hi 
io-io-hi 
lo-lo-hl 
lo-lo-hl 
io-io-hi 
lo-lo-hi 
lo-lo-hi 
lo-lo-hi 
lo-lo-hi 
lo-lo-hi 
lo-lo-hl 

lo-lo-lii 

lo-lo-hi 
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Pkey: Unique Eos fn 
CAT number: Geneduster number 
Accession: Genbankao 

Pkey CAT Number 
108462 116651.1 

108489 11866?.1 

101216 17379 1 AA284166 AA314707 L25876 L27711 AA092745 N92C87 U02681 AA315766 BE385121 AA352693 NM.005192AI739135 AI066521 AW173105 
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AI277603AI277601 AI300268AWig5846AI70B510 
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303274 

5 303297 

302656 

10 



302767 
319080 
25 316486 



311251 
319109 
30 318538 



22162_1 
10778_1 



1623718..1 
18CS43J 
81C941.1 



318541 
317916 
60 317939 
303502 
303506 

310787 
65 302943 
318617 



302970 
80 • 319289 
312073 



1667176J 
54C437_1 
542313J 
325188J 
74515_2 

723998J 



AK001468 M19C315 M374980 AW961 179 M307782 AA315295 M347194 AW963C73 AW368190 AW368192 AA280772 AA251247 N85676 
AI215522AI21 6389 N87B35R1 2261 ra7094AI660045AA347193R16712AW119CC6N55905 N87768AW900167AI341261AI818674D20285 
AI475165AA300756 R40626AI122827AA133250AI952488AA970372AA889845AWC6g517AI524385AA190314AI673359AA971105 
A1351088A1872789 AI919056 AI611216 AK001472 BE568761 AA581004 

AF070523 T80072 H08917 R35413 H14848T80C74 C15452 D81744 F05382 F05380 Z45148 R18285 AI634532 BE549752 AW299752 
AW090717AI693471 H08831 BE2m66AI373383AW1377C2AI241235 R4921CR38766AA757779 R38765AI498410AI693124AI648374 

R14143 

BE090B80 R96998 AA091152AA4B867BAA644573AA563967 BE0905B4AA079122 N79188 R95018AW958397AA190398 AA563719 
AA379530 AA280050 AA1 90542 AW328142 AA306992 AA383598 AW293005 BE254231 BE018829 BE207008 AW247508 AW3281 43 
AI888789AI953071 BE617691 AW245093 AW079089AI825722AA102386AA621823 AA486490 AI286316AI63B534 BE551712 N62469 
AA903777 AA991450 AI056209 AA079223 AA707656 AA442421 R94934 AA539374 BE613108 AA056180 AA046427 
AK001379 AKC0141 1 AW79571 1 T06997 AA287540 AA354538 AW957773 AI632268 AI651003 AI689650 AI809332 AW3C4483 AI805259 
AA278506AA862381 AA287875AW628545AIO85761AWO25965AI658615AW628879AW139496A1214278AA9C2745AA991679 BE540102 
AW593658 AI745502 AA744687 AI285441 AA807089 AI218314 AA721 449 AI202987 AA432129 A1285502 A1281 462 AA73131 9 BE082573 
NM.014785 D87447 BE263434 AA400883 AW4078B1 A116051 5 N51680 AW583B55 AA844421 A1274202 BE019777 AW998722 A)420586 
AI612828 AI755501 AW01 5434 AI955032 AL133780 AA928914 BE548610 D31490 AL048391 BE552480 A1796059 AW1 73479 AA341 631 
AI93461 1 AI274836 AA373732 AA525028 AI571392 AI392971 AI738589 A1953828 AI061 125 AW772523 AI361 106 AW883276 R45884 
AI366652 AW236104 AI873069 F15747 AI362185 AI360910 AI419573 AA974612 AI143525 AA995238 AI214649 AI591399 BE170850 
BE163405 

AW207582 AI962335 AI632618 BE504857 
AI431798AW418B36AI307777BE274992AI910729AW751094 
AI43gi36 AI338013 AW204095 AI910519 AW977(I64 
H94900 N3g891 

AW967646AA251431 Z45131 R20502AI911796AA234C20 AA232982 H29165 
T23514AI655785 

A1681545AI951714AI570397AW873588AA836396AI359986A1499790AA773477A1951615T07547AW304709AF114041 BE176629 

Z44580 T30422 T32690 AW953065 H10602 

AI655662AW014514A1686482 

Z45662 AA282123 H10149 AA505157 W92511 N78341 

AI750979 AI69ai64 AI807700 AI681C67 N35860 N28625 R98369 R53158 R56501 AI7502g2 AA319987 BE1 22902 AA094362 T36150 Z30223 
T34600 H06612 F13507 BE615062 AA332035 T35478 R58469 T35542 AA12B518 R58400 H04119 AA329969 AI435429 N31656 AA151326 
AA1 51 327 T80239 AF07a648 H79097 AA7481 1 5 C02997 AA385870 H25456 H48665 H81 253 R54555 AA083618 R4801 4 R48397 BE61 5503 
BE61 5437 AA328253 BE531052 N45373 W06934 W45683 AA444383 AW369052 AA483867 W93600 R93256 R83439 W67400 AA461434 
AA493673 W94180 AA054776 AA151260 AA558674 C03776 C02719 C02874 R32667 W30825 AA463399 AA429967 AA502956 AA973501 
C02309 AAC37446 H44694 T6001 1 C05126 AL133639 H96749 AA30581 0 AA1 51524 AA304647 AA148902 AA730403 AA303439 T81041 
W40357 AA375204 BE122903 R77190 R77048 AF074993 AA034379 R81 032 R35976 AW798160 AI807741 A1985921 D61 845 C04231 
AI709069 AW340655 AI089307 AW072394 W45684 AA054588 AI660784 AI128025 AW00651 6 AW298028 AA034460 AW1 66802 AA080891 
AI336397 AA1 49439 N75560 AA57B13B AA972868 AA776324 AA662526 AI750291 AI760980 AA461 1 17 AA878326 AA693226 N90723 
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74700 2 
500789_1 

246486_1 AA495930 AI4708gO Hg7831 AA35035B BE166712 

c_x_lis 

CH22_4138FG_41J_ 
CH22 4167FG 46 1 
CH22_4244FG_83 17_ 
CH22_6535FG_LINK.EMAC00 
AI133628 

CH22_6g44FG_UNK_EM;AC00 
CH22.72g4FG_UNK_EM;AC00 
CH22_72g5FGL.UNK.EM.AC00 
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333124 CH22_353FG_81_8_UNK_EM:A 

333135 CH2^364FG_83J1_UNK_EM: 

333137 CH2i.366FG_83J3_UNK_EM: 

333138 CH22_367FG.83_15_UNK^M: 

333139 CH22_36BFGJ3J6_UNK_EM: 
303187 487417_1 AA115962 AA078794 
326213 c17Js 

333516 CH22_772FGJ73_1_LINK_EM: 

333517 CH22_773FG_173_2_LINK_EM: 
333743 CH22.1009FGJ64_1_LINK_EM 

333795 CH22 1053FG 275_1_LINK EM 

333796 CH22_1055FG 275_3_LINK_EM 
335044 CH22_2367FG_430_1_L1NK_EM 

333808 CH22_1077FG 279_2_L1NK EM 

333809 CH22J078FG_280_2_UN[<ZeM 
333845 CH22_1114FGJ90_3_UNK_EM 
333349 CH22 1118FG_290_8_UNK_EM 
335149 CH22_2484FG_499_5_UNK_EM 
305096 AA642964 

335289 CH22_2631FG.527JLUNK_EM 

335290 CH22J632FG_527.3JJNK3M 
335293 CH22J635FGJ27.6_UNK^M 
326816 C20J1S 

303951 AW475081 

305232 AA670052 

328164 c.6.hs 

305503 AA759177 

335682 CH22_3043FG_595_2_LINK_EM 

305612 AA782347 

335753 CH22J120FGJ04_2_LINK_EM 

335755 CH22_3122FG_604 4.LINK_EM 

335756 CH22_3123FG_604.5_UNICEM 

335809 CH22_3181FG_617J_UNK.EM 

335810 CH22_3182FG_617.7_UNK_EM 
328648 CH22.3197FG.619.11_UNK.E 
337182 CH22 5204FG 570 2 

307111 AI174S28 

330032 c16_p2 

330033 c16.p2 

337603 CH22_5896FG_LINK_C20H12. 

337674 CH22_6005FG_UNK^MAC00 

337675 CH22.6006FG_LINK_EMAC00 
337755 CH22.6105FG_L1NK^MAC00 
339186 CH22J120F(i_UNKJIA59H1B 
309390 AW0805B5 

309575 AW168096 

332792 CH22_8FG.3XLINK.C4G1.GE 

334101 CH22.1379FG.327.59JJNK.E 

304049 T58155 

334221 CH22J504FG.360J.LINK.EM 

334222 CH22J506FG.360.3.UNK.EM 
334282 CH22_1571FGJ69J2JJNK_E 
302910 386182.1 N77976 W03184 
325889 Cl6 hs 

327110 c21_hs 

304263 AA062837 

304275 AA070605 

304309 AA112147 ' 

334502 CH22_1802FG 397 18_LINK_E 

334578 CH22_1883FG 406J_LINK.EM 

304521 AA464716 

334616 CH22 1923FG 411J5_LINK.E 

304541 AA482561 

336054 CH22.3440FG.683.3_UNKJ3J 

304735 AA576453 

334891 CH22.2208FG.45i.6_UNK.EM 

334899 CH22J216FG_452_13_LINK_E 
306011 AA896986 

334900 CH22 2217FG_452 14 LINK.E 
334902 CH22_2219Fq.452_16_LINK_E 

334905 CH22_2222FG 452_2C_LINK E 

334906 CH22.2223FG_452.21.LINK.E 
334951 CH22_2272FG 465 20_LINK E 
327821 c_5_hs 

330416 13440J D83777 NM J14766 AA333003 BECC4425AL1 19670 AA323656 BE296006 AL11B935 BE256656 AA374227 BE271472 BE296326 AW583557 
AW583625 N40409 AW608433 AA324811 AA190746AW949591 BE000350AA350275 BE392178AA430618AA348536 AA366634AW818371 
AA31 7886 BE072912 BE072917 AA323887 W38798 AA322171 W46661 AA036818 AA309827 AW583615 AA378262 W25430 H97457 N42389 
AA169692 AA364115 H4218C AA081704 AA775719 AI185130 N75656 AW0061 17 AA984601 AI421 198 AA181467 AW511204 AA181639 
N64808AI937715AA169219AA088783AA548717AW238470AW662116AW166218D51086AI867027AA729243AI923221AI357913 
AI375759 AA987267 AA773569 AW500216 AA191460 AA633234T347S7 AA527048 C75239 N93172 AW12g534N33415 AJ239459 BE32B344 
AW418717 AI308347 H42999 N24779 AA621221 AI497806 AI418e55 AW41B718 AI089499 A[332S76 AI039047 AW533402 AA430S00 
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330694 

330706 
330714 



331131 
331180 
331278 



77CC43_1 
143843_1 
1335173. 



685_8 



16353_2 



172787_1 
191360_2 



AI271939 A1798736 AA6128C3 AW169919 A1183542 AA8430B5 C06884 C75127 AW04468C T03756 AW583349 AA082053 AA877439 
AI298253AA010549AW168981AI372978AI039490AI311909AI313396 W81554AI582863AI566169AA010548AA748398AI092356 
AI074928 AA862701 W46570 AI570312 AA582306 AI082059 AI452384 AI498938 AA953378 AA910381 AA987271 AW664437 AW583393 
T33340 H50310 AI361 354 T1 5902 AI280310 AW583343 T15989 AA995343 AA718958 AI277293 A1468250 A1860396 AI951 938 AA018659 
AI590916AI383915AI382782AA844109At016130AA812632AC004912AI091734AW893561 AW893559AA984413AA484993AA491098 
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AA335299 BE208375 AI140834 AA0B8181 AI860314 AI738613 T70902 R42077 AI884558 AA489798 AI130828 AA0a9735 H25381 AW612425 

25 R46801 H27507 H30105 H44671 AI631362AA558470AW014412AA552059 AA045801 AW5B9435AI039657H146UAA974256 R42078 

AI245758 T61BB6 AI559202 AI07413g AI81731 3 AI041484 AA4371 38 AI61 3032 AI1478gi AI457945 AW1977Z7 Aia7439g AI758636 AI598048 
AA972077 M85390 R36989 R71936 AI867492 T40081 Z41115 AA772775 T41013 AI695691 740996 AIB26822Ng3464AWg55524AA0886S1 
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335763 
335755 
335756 
336662 
336684 
337603 



330032 
330033 
326213 
326816 
327110 



Strand 



TABLE 1C 

Pkey: Unique number corresponding to an Eos probeset 
Ref: Sequence source. The 7 digit numbers in tliis column are Genbank Identifier (Gl) i 

liuman chromosome 22.' Dunham I. el al. (1999) Nature 402:489495. 
Strand: Indicates DNA strand from which exons were piedlcled. 
NLposltion: Indicates nucleotideposilionsof predicted exons. 

Pkey Rel 

333135 Dui 

333137 Dm 

333138 Dui 

333139 Dui 

333516 Dui 

333517 Dm 

333795 Dui 

333796 Dui 

333809 Dui 

333846 Dui 

333849 Dui 

334101 Dui 

334616 Dui 

334891 Dui 

334899 Dui 

334900 Dui 



numbers. 'Dunham I. el al.' refers to the publication entitled The DNA sequence of 



7807688-7807795 
7808253-7808319 
7880600-7880775 
7830600-7880775 



335809 
335810 
335824 
336054 
336721 



8018323-8018472 
9973413-9973550 
15176123-15176470 
19299770-19299944 
1931! 
1931! 

19317083-19317195 
19322553-19322680 
19323493-19323590 
20342088-20842682 
21497441-21497587 
26310772-26310909 



29161685-29161937 



Dunham, I. et.al. Plus 



3971764-3971900 

8138219-8133392 

17089711-17089988 

3318017-3317932 

7573218-7573060 

12730944-12730387 

12732417-12732289 

13285293-13285178 



15004462-15004304 
20147708-20147502 
22305950-22305708 
22309950-22309891 
22316408-22316275 
25421215-25421093 
25761636-25761444 



25764330-25764251 



85177-85237 

86663-86723 

50751-60927 

198354-198436 

94608-94785 

131060-131232 

27080-27226 



107687-107765 
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e. Tliese genes were selected by 

analysis of vaiiance, such that the P value Is less than 0.01, the 90ih percentile extiibits a ninimum of 100 average Intensity across all samples, and a comparison of any group 
means shows a minimum 3 fold change. The inleresting expression pallenis can tie broadly defined into tlie following categories: 

1. Genes that are expressed early in the time course of androgen withdrawal, tlien drop off in expression, and then express again with emergence of androgen-independence 



ie, then drop off in expression Immediately after androgen-withdrawal, an 



lie 2A). 

Z Genes that are expressed early in the time coi 
independence (hi-Mo-lo pattern in table 2A). 

3. Genes thai we expressed early in the time course, then drop off in expression after several days of androgen withdrawal, and do not express again with emergence of 
androgen-indBpendence (hi-N-lo-lo pattern in table 2A), 

4. Genes that are not expressed early in the time course, but express only with emergence of androgen-independence {lo-lo-lo-hi pattern in table 2A). 

5. Genes that are not expressed early in the lime course, but then express as androgen Is withdrawn an ' " ' '" ' ' 
hi-hi pattern in table 2A). 

6. Genes that are not expressed early in the time course, but then express as androgen Is withdrawn and drop off again with em 



Table 2B lists accession numbers for prlmekeys lacking a unlgenelD In table 2A. For each probeset is listed a gene cluster number from which oligonucleotides were designed. 
Gene clusters were compiled using sequences derived fram Genbank ESTs and mRNAs. These sequences were clustered based on sequence similarity using Clustering and 
Alignment Tools (Doublelwist, Oakland California). Genbank accession numbers for sequences comprising each cluster are Bsled in the "Aocesslcn" column. 

!rs in table 2A. For each predicted excn is listed genomic sequence source used for 



Pkey: Unique Eos probeset identifier number 

ExAoon: Exemplar Aooesaton number, Genbank accession number 

UnigenelD: Unigene number 

Unigene Title Unigene gene title 

Pallern; Broadly defined expression patterns during androgen withdrawal 



Pkey 



AV65372i 
AK001270 



433412 AV653729 Hs.8185 
429097 
442731 
420820 
422267 
416963 
413277 



UnigenelD Unigene Title 



428523 
435847 
443967 
440838 
404054 
431697 
432114 I 



N31537 

H24177 Hs.75262 cathepsinO 

AI583661 Hs.60548 hypothetical protein PR01 635 

AW974540 Hs.98626 ESTs 

W93821 Hs.39780 CDA017 pmteln 

AW294013 Hs.200942 ESTs 

AA907075 Hs.131307 ESTs 

Hs.38540 ESTs, Weakly similar to ALU4_HUMAN ALU S 



Hs.200712 ESTs 
414094 H15088 Hs.31433 ESTs 
424005 AB033041 Hs.137507 vang {van gogh, Drosophlla)-llke 2 
424401 H67220 Hs.1 69681 death effector domain-containing 
449749 AI668611 Hs.49760 ESTs 
458368 BE504731 Hs.138827 ESTs 
427221 LI 5409 Hs.1 74007 von Hlppel-Lindau syndrome 
432715 AA247152 Hs.200483 ESTs, Weakly similar to K1AA1074 protein 
425980 AA365951 gb:EST77963 Pancreas tumor 111 Homo sapi 

gb:EST374677 UfiGE resequences, MAGG Homo 
gb;od56c02.s1 NC1_CGAP_GCB1 Homo sapiens 
Hs.1 1 7242 meningioma expressed anBgen 6 (colled-c 



412492 
438882 
422473 



AW9626C4 
AA827695 
U94780 

AI640185 
AIC76765 
AI598C22 
A1634046 
R37337 



Hs.283626 ESTs 
Hs.269899 ^ 
Hs.1 93989 
Hs.1573f3 
Hs.1 21 11 



Hs.5944 
Hs.77554 
Hs.295963 
Hs.301444 



ESTs 

hypothetical protein FLJ22625 
hypothetical protein FLJ20980 
Human DNA sequence from done FiP5-1046G1 
solute canierfainily 1 1 (proton-coupled 
Homo sapiens cDNA FU14967 fis, clone TH 
ESTs 



419526 ; 

422072 ; 

463459 [ 

419033 I 

413243 I 

432079 I 



gb:EST384840 kMGE resequences, WfiGL Homo 



io-lo-hi-k) 
lo-lo-hi-lo 
lo-lo-hi-lo 



Ic-b-hi-lo 
lo-lo-hl-lo 
lo-b-hi-ki 
lo-lo-hi-lo 
lo-to-hi-la 
lo-b-hl-lo 
lo-ki-hl-lo 
lo-lo-hl-la 
. io-M-lo 
lo-to-hi-lo 
k>-k)-hi-lo 
lo-lo-hi-lo 
io-kvhl-lo 
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441328 AI982794 Hs.159473 ESTs lo4o-hi-lo 

416508 R39769 ESTs, Moderately similar to ALUB.HUMAN A Wo-hi-lo 

4S1066 AI758660 Hs.206132 ESTs Wo-hl-ta 

446017 N98238 Hs.55185 ESTs lo^hl-lo 

447104 R1908S Hs.210706 Homo sapiens oDNAFU13182 Us, done NT Wo-hi-lb 

447211 AL161961 Hs.17767' KIAA1554 protein loJo-hi-lo 

447765 AW014112 Hs.161390 ESTs loJo-ht-k) 

429540 M85776 gb:EST02297 Fetal bisin, 8lratE«ene {cat Mo-hi-b 

444314 AI140497 gb:Dw76b095l SoaiEsjaaUverjspleen. lo-lo-hi-lo 

414556 N98569 Hs.76422 pliospholipaseA2, group IIA (platelets, lo-lo-hi-lo 

432677 NM.C04482 Hs.278611 UDP-N-acetyl-alplia-D-galaotosamIne:polyp lo-lo-hi-lo 

422091 AI906339 Hs.97927 ESTs lo-lo-hl-to 

423028 Hg0946 gb:yu86o02.r1 Soareslelal liver spleen lo-lo-hi-lo 

444040 AF204231 Hs.182982 golgin^7 lo-lo-hi-lo 

441111 AI8C6867 Hs.126594 ESTs lo-lo-hi-lo 

418838 AW38S224 Hs.35198 ectonuoleofidepyrophosphatase/phosphodl lo-lo-hi-lo 

415999 AA172179 Hs.294029 ESTs lo-lo-hi-lo 

429615 AF258627 Hs.211562 ATP-binding cassette, sub-fetnlly A {ABC1 lo-lo-hi-lo 

427774 AA278583 Hs.180737 Homo sapiens done 23664 and 23905 mRNA lo-lo-hi-lo 

438585 AAB113ri Hs.1 23362 ESTs lo-lo-hi-lo 

424776 AI867931 Hs.164595 ESTs lo-to-hl-b 

413786 AW613780 Hs.13500 ESTs lo-to-hHo 

421077 AK000061 Hs.1O1590 bypollielical pralein lo-to-hl-b 

445837 AI261700 Hs.145544 ESTs lo-lo-hl-to 

449282 AL048066 H6.23437 Homo sapiens 0DNAFLJ13555 Us, clone PL lo-lo-hi-to 

414065 AW51S373 Hs.271249 Homo sapiens cDNA FU13580 Us, ctone PL to-to-hl-to 

432527 AW975028 Hs.102754 ESTs lo-lo-hHo 

412093 BE242691 HS.U947 ESTs lo-lo-hi-lo 

457121 AI743770 Hs.180513 ESTs, Weakly similar to KIAA0822 protdn lo-lo-hi-lo 

417280 AW173116 Hs.250103 ESTs lo-lo-hi-ta 

452446 AB002438 Hs.29596 Homo sapiens mRNA ftom chromosome 5q21-2 lo-lo-hi-lo 

438624 AA889055 Hs.1 23468 ESTs lo-lo-hi-lo 

442343 AA992480 Hs.1 29874 ESTs lo-lo-hi-lo 

401416 C14000338«:gl|7459502|pir||S74665 outer lo-lo-hi-to 

437176 AW176909 Hs.42346 oalcineurin-blnding prateln calsarcln-1 to-lo-hi-to 

451663 AI872360 Hs.209293 ESTs lo-lo-hi-to 

449296 AW137268 Hs.270964 ESTs lo-lo-hi-to 

426848 H72531 Hs.36190 ESTs lo-lo-hi-to 

445467 AI239832 Hs.1 5617 ESTs, Weakly sWar to ALU4_HUMAN ALU S lo-lo-hi-to 

418662 A1801098 HS.15160C ESTs lo-lo-hl-to 

416239 AL038450 Hs.48948 ESTs lo-to-hHo 

428054 AI948688 Hs.266619 ESTs lo-to-hl-lo 

436284 AA8T9470 Hs.9684g Homo sapiens cDNA FU11492 lis, otone HE lo-to-hl-to 

424332 AA338919 H3.101615 ESTs lo-to-hl-to 

442369 AI565071 Hs,169983 ESTs to-to-hl-to 

420717 AA284447 Hs.271887 ESTs to-b-hi-to 

439584 AAB38114 Hs.221612 ESTs lo-b-hMo 

44026C AI972867 Hs.7130 coplneW lo-to-hl-to 

426269 H15302 Hs.168950 Homo sapiens mRNA; cDNADKFZp566A1 046 {f lo-to-hi-b 

428398 A1249368 Hs,98558 ESTs to-b-hl-to 

407276 AI951118 Hs.326736 Homo sapiens breast cancer antigen NY-BR lo-to-hl-to 

409339 AB020686 Hs.54037 cclonuclcotidc pyrophospliataso/phosphodi lo-lo-hi-to 

442150 AI36B153 Hs.70983 PTPLI-associaled RhoGAP 1 lo-lo-hi-to 

416787 HOI 463 Hs.93534 ESTs lo-b-hi-to 

430686 AI690234 Hs.191666 ESTs, Weakly similar to GNMSLL -etroviru lo-lo-hl-to 

443794 N94104 Hs.29280 ESTs lo-lo-hi-lo 

446216 AW821329 Hs.14368 SH3 domain binding glutamic acid-rich pr lo-b-hi-to 

441285 NM 002374 Hs.167 microtubule-associated protein 2 lo-lo-hi-to 

448738 BE614081 gb:601503815F1 N1H_MGC_71 Homo sapiens c lo-lo-hi-to 

403746 ENSP00000226812*;KIAA1494 protein {Fragm lo-lo-hi-to 

434022 R18374 Hs.1 17956 ESTs lo-lo-hi-to 

435714 AA699325 Hs.269880 ESTs to-lo-hi-to 

439848 AW979249 gb:EST391 359 MAGE resequences, MAGP Homo lo-lo-hi-to 

421974 AA301270 gb.EST14192TestistumorHomosaplenscD lo-to-hi-b 

433332 AI367347 Hs.44898 HomosapiensotoneTCCCTA00151 mRNAsequ to-to-hHo 

449919 AI674685 Hs.200141 ESTs to-to-hi-to 

407192 AA609200 gb:af12e02.s1 SoaiBS_teslis_NHT Homo sap to-lo-hi-to 

436169 AA888311 Hs.17602 Homo sapiens cDNA FU12381 Us, otone MA lo4o-hl-to 

418624 AI734080 Hs.1 04211 ESTs " lo-to-hl-to 

432432 AA541323 Hs.1 16831 ESTs fo-b-hi-to 

426172 AA371307 Hs.125056 ESTs b-b-hi-to 

401093 C12000586*;gil6330167ldbi|BAA86477.1|tA loJo-hl-to 

426716 NliL006379 Hs.171921 soma domain. Immunoglobulin domain (Ig), lo-b-hi-to 

439569 AW602166 Hs.222399 CEGP1 pralein to-b-hi-to 

451720 AW970985 Hs.290853 ESTs ta-lo-hi-to 

429163 AA884766 gb:am20a10.s1 Soares_NFLT_GBC_S1 Homos loJo-hi-to 

432435 BE218886 Hs.282070 ESTs lo-to-hl-to 

408170 AW204516 Hs.31835 ESTs to-b-hl-to 

433530 BE349534 Hs.281789 ESTs to-k>-hi-to 

425776 U25128 Hs.159499 parathyroid hormone receptor 2 lo4o-hi-to 

430068 AA464964 gb:zx80n0.s1 Scares oi/ary tumor NbHOTH lo-to-hl-to 

422725 AA315703 Hs.199gg3 ESTs, Weakly similar to ALUB.HUMAN 111! to-b-hi-to 
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15 
20 



35 
40 
45 
50 



432314 


AA533447 


Hs.31298g 


ESrs 


Mo-hi-k) 


434609 


R76593 




gb.vi60c11.r1 Soares placenta Nb2HP Honno 


Wo-hi-to 


448760 


AA3I3825 


Hs.21941 


AD03S protein 


lo-lb-hi-b 


456334 


AF164142 
T50392 


Hs.82042 
Hs.271745 


solute carrier fanvly 23 (nucleobase Ira 
ESTs 


lo-lo-iii-lo 
iotKl'lo 


435445 


AA737345 


Hs.294041 


ESTs 




438869 


AA888624 
AF075009 


Hs.197289 


rab3 GTPase-acSvaling prolan, non-cala 
gb;Hcmo sapiens full lenglli insert cDNA 


io-lo-hi-lo 


423932 


T95633 


Hs.189703 


ESTs 




422222 


AI699372 


Hs.1 93247 


hypotlietical proleln DKFZp434A171 


lo-lo-lil-lo 


434941 


AW073202 


Hs.334825 


Homo sapiens cDNA FU147521is, done NT 


lo-lo-lii-lo 


415736 


AA827082 




ESTs 




432722 


AA830532 


Hs.326150 


ESTs 




435511 


AA683336 


Hs.189046 


ESTs 






AW022715 


Hs.162160 


ESTs, Weakly similar to ALU4JHUMAN ALU S 






AW772713 


Hs.247ia6 


ESTs 


lo-lo-lii-lo 


450546 


AA010200 
BE086815 
AF086134 


Hs.175551 
Hs.94309 


ESTs 
ESTs 
ESTs 


lo-lo-hi-ta 
lo4o-hl-b 
Io-lo-hi-lo 


452688 
41S66S 


AA721140 
m 005025 


Hs.49930 
Hs.78589 


ESTs, Weakly similar to putative pi 50 [H 
serine (oroj^telne) proteftiase hhlWto 


to-lo-hi-k] 
lo-lo-W-to 


450164 


AI239923 


Hs.63931 


ESTs 


ki-lo-hl-lQ 


417169 


R13550 


Hs,246773 


ESTs 


ki-to-hi-lo 


443645 


R36475 


Hs.24321 


Homo sapiens cDigAFU12C28 lis. done HE 


lo-lo-hl-lQ 


424878 


H57111 


Hs.221132 




lo-Io-hl-la 


449618 


AI076459 


Hs.15978 


KIAA1272 protein 


ki-lo-hl-ia 


432572 


AI660840 


Hs.191202 


ESTs, Weakly similar lo ALUEJHUMAN llll 


lo-lo-hMo 


400293 


N51002 


Hs.306480 


Homo sapiens mRNA; cDNA DKFZp761 E21 12 (f 


lo-k)-hi-lo 


431474 


ALI 33990 


Hs.1 90642 


CEGP1 protein 


lo-lo4ii-lo 


421674 


T10707 


Hs.296355 


hypolhelfcal protein FU23138 


lo-lo-hl-lo 




AA908678 


Hs.1 301 83 


ESTs 


Io-lo-hi-lo 


425332 


AA633306 


Hs.127279 


ESTs 


lo4o-W-b 


451411 


AA017492 


Hs.135655 


EST 




419972 


AL041466 


Hs.1 82982 


golgin-67 




434604 


AA649530 


Hs.348148 


gb;iis44i05.s1 Na_CGAPJJv1 Homo sapiens 




442832 




Hs.253569 


ESTs 


lo-lo-hl-lo 


408660 


AA625776 




ESTs, Wloderately similar lo PC4259 fern' 


io-lo-hi-lo 


432674 


AA641092 


Hs.257339 


ESTs, Weakly similar to 138022 hypotheli 


lo-lo-hl-lo 


448150 






ESTs 


lo-lo-hl-lo 


450468 


AW379075 


Hs.141742 


Homo sapiens oDI^A FLJ12211 Us, clone MA 


lo-lo-hl-lo 


452874 






hypolhelical protein FLJ10199 


Io-lo-hi-lo 


412088 


AI689496 


Hs.108932 


ESTs 


lo-to-hi-lo 


443451 


AI057404 


Hs.58698 


ESTs 


Io-lo-hi-lo 


453853 


AL040600 


Hs.188083 


ESTs 


Io-lo-hi-lo 


419863 


AWg52691 


Hs.93485 


Homo sapiens mRNA; cDNA DKFZp761 D191 {fi: 


lo-lo-hl-lo 


420729 


AW964897 


HS.29082S 


ESTs 


lo-k>-hI-la 


440801 


AA906366 


HS.19CS35 


ESTs 


lo-k>-hl-la 


407284 


AI5392Z7 


Hs.214039 


hypotlietical protein FU23S56 


lo^o-hl-lo 


428279 


AM25310 


Hs.155766 


ESTs, Weekly similar lo A47582 B«dl gr 


lo-lo-hl-lo 


436862 


A)821940 




ESTs, Moderately simDarioALU8LHUMANA 


Io-lo-hi-lo 


432340 
442048 


AA534222 
AA974603 




gb:nl21d02,s1 NQ.CGAPJWI Homo sapiens 
gb:op34f05.s1 Soares_NFLT_GBC_S1 Homo s 


lo-kj-hl-lo 


418781 


T41160 


Hs,8404 


ESTs 




450642 


R39773 


Hs.7130 


copine IV 


Io-lo-hi-lo 


451661 


AB020650 


Hs.26777 


Homo sapiens, Similar to KIAA0843 prolei 


Io-lo-hi-lo 


435812 


AA700439 




ESTs 




448066 


AI459177 




ESTs, Moderately similar to ALU7_HUMAN A 




453486 


AL039201 


Hs.173554 


ubiquinol-cyloclirome c reductase core pr 




414312 
438980 


AA155694 
AW502384 




ESTs 

gb;UI-HF-BRQp-aka-f-1 2-0-Ul.r1 NIH_MGC_6 


lo-lo-K-to 


408001 


AA046458 


Hs,95296 


ESTs 


lo-lo-hi-lo 




AW953805 


Hs,21887 


ESTs 




414426 


D60746 


Hs,25925 


Homo sapiens, done MGC:15393, mRNA, com 


lo-lo-hl-lo 


444563 


N57057 


Hs.284163 


ANKHZNpraleIn 


Io-lo-hi-lo 


418771 


AA807881 


Hs.25329 


ESTs 




417843 
415565 


W07361 
AA642449 


Hs.22545 
Hs.48994 


Homo sapiens cDNA FU12935 Us, done NT 
ESTs, Weakly similar to AF151800 1 Ca4 


loJo-hi-lo 
lo-b-hi-lo 


419229 


AI827237 


Hs.282884 


ESTs 


to-k>4ii-lo 


419905 


AW248229 


Hs,93659 


protein disulllde Isomerase related prot 


lo-k>-hi-lo 


452870 


AW5027S1 


HS.309G9 


KIAA0430geneprodud 


Io-lo-hi-lo 




AK000566 


Hs.98135 


hypothetical protein FLJ20559 




416157 


NMJX)3243 H6.342874 


translbrming jrowth factor, beta recsplo' 


lo-k>4ii-la 


439305 


AW393883 


Hs.98968 


hypotheScal protein FU23058 
neuiotrlmin 


lo-to-hi-lo 


419235 


AW470411 


Hs.288433 


lo-lo-hl-lD 


416640 




Hs,79404 


neonon-spedfk! protein 


lo-to-hMo 


434938 


AW500718 


Hs.8116 


Home sapiens, clone MGC:16169, mRNA, com 


lo-lo-hi-lo 




AI241733 


Hs.43871 


ESTs 


Io-lo-hi-lo 


438459 


T49300 


Hs.35304 


Homo sapiens cDNA FU13655 As, clone PL 


lo-k)-hi-lo 


418381 


AA682393 


Hs.119237 


ESTs 


lo-lo-hi-lo 


432161 


AK000400 


Hs.341181 


ESTs, Weakly similar to envelope [H^apl 


lo-to-hi-lo 


418283 


S79895 


Hs.83942 


catliepsin K (pycnodysostosls] 


ki-lo-hi-to 


421443 


BE560141 


Hs.166148 


hypothetical protein FU13231 


lo-lo-hi-to 
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416619 AF013168 Hs.79393 tuberous sclerosis 1 lo-lo-hi-lo 

449802 AW901804 Hs.239B4 hypoMcal protein FU20147 Ic-b-iii-lo 

446714 W73818 Hs.110028 ESTs lo-lb-iil-lo 

413195 AA127382 Hs.22404 protease, serine, 12 {neurotrypsin, molo lo-io-iii-io 

438233 W62448 Hs.56147 ESTs lo-lo-iiMo 

416051 AA835868 Hs.25253 mannosidasei alplia. class 1A. member 1 loJo-iil-io 

438855 AW946276 Hs.6441 Homo sapiens mRNA; cDNA DKFZp586J021 (fr lo^o-lli-lo 

425907 AA365752 Hs.155965 ESTs loJo-lii-lo 

451295 AI557212 Hs.17132 ESTs, Moderately similar fo 154374 gene ta-lo-lii-lo 

416443 T07363 Hs.7948 ESTs lo-lo-lli-io 

422366 T83882 Hs.97927 ESTs lo-lo-lii-io 

435163 AA668884 Hs.19155 ESTs io-lo-lli-lo 

426559 AB001914 Hs.170414 paired iBsio amino acid cleavirg system to-Io-lii-lo 

448988 Y09763 Hs.22785 gamma-aminobutyric acid (GABA) A recepto io-io-iii-lo 

453655 AW960427 Hs.342a74 transforming gpowtli factor, beta recepto lo-lo-lil-lo 

414516 AI307802 Hs.135560 ESTs, Weaidy similar to T4345Bhypolbetl lo-lo-iii-io 

420028 AB014680 Hs.8785 carbciiydrale {N-acetylgliJCOsamine-6-0) s lo-lo-hi-lo 

430223 NM_002514 Hs.235935 nephroblastoma overexpressed gene lo-lo-hi-lo 

425887 AL049443 Hs.161283 Homo sapiens mRNA; oDNA DKFZp586N2020 {f lo-lo-hi-lo 

442577 AA292998 Hs.163900 ESTs lo-lo-hi-lo 

424940 AA985308 Hs.283902 ESTs lo-lo-lii-lo 

428839 AI767766 Hs,82302 Homo sapiens cDNAFU14814 Us, clone NT lo-lo-hi-io 

443868 W88483 Hs.293650 Homosapiensn)Rl^brRGPR-p117.complet lo-b-hi-lo 

430334 AI824719 Hs.328700 ESTs to-lo-W-lo 

439686 W40445 Hs.235857 ESTs, Wealdy similar to 138022 hypothetl Uo-hi-lo 

423754 NM_016181 Hs.132526 melanoma antigen lo-lo-hl-lo 

415205 H71516 Hs.135233 ESTs lo-lo-hHo 

426413 AA377823 gb:EST90805 Synovial sarcoma Homo sapien lo-lo-hl-lo 

407204 R41933 Hs.140237 ESTs, Wealdy similar to ALUIJHUMAN ALUS lo-lo-hl-io 

430234 N29317 Hs.236463 KIAA1 238 protein to-lo-hl-io 

437143 AW204056 Hs,8917 ESTs lo-lo-hi-hi 

445162 AB011131 Hs.12376 plccalo (presynaptic cytomalrix protein) lo-lo-hi-hi 

415083 AI632683 Hs.27179 Homo sapiens oDNA FU12933 lis, clone MT lo-lo-hi-hi 

442924 AA533513 Hs.93659 protein disulfide Isomerase related prat lo-lo-hi-hi 

429536 AA873016 Hs.206097 oncogene TC21 lo-lo-hl-hl 

468584 AF217618 Hs.324136 PTD012 protein lo-lo-hi-hi 

419647 AA343947 Hs.91816 hypothetical protein lo-lo-hi-hi 

427201 AB037860 Hs.173933 nuclear factor I/A lo-lo-hl-hl 

428030 AI915228 Hs.11493 Homo sapiens cDNA FLJ13536 Us, clone PL lo-lo-hi hi 

411779 AA292811 Hs.72050 non-metaslatic cells 5, protein expresse lo-lo-hi-hi 

442482 HM 014039 Hs.8360 PTD01 2 protein lo-lo-hi-hi 

417468 NM 005655 Hs.82173 TGFB inducible early growth response lo-lo-hi-hi 

438021 AV653790 Hs.324275 WW domain-containing protein 1 lo-lo-hi-hi 

409799 D11928 Hs.76845 phosphoserine phosphalase-lll(8 lo-lo-hi-hi 

440676 NM.004987 Hs.112378 UMandsenescentcellanligen-liladcma lo-lo-W-hi 

421437 AW821252 Hs.104336 hypothetical protein lo4o-h|.lil 

456362 Amiam Hs.179909 hypothetical protein FU229g5 Ia-la-hl4il 

407686 AWSC1268 Hs.1 26043 chromoscme 21 open reading frame 51 lo-lo-hl-hl 

431129 AL137751 Hs.263671 Homo sapiens mRNA' c[»IADKFZp434l0812{f lo-lo-hl-hl 

431874 AW610031 Hs.323914 translocase of Inner mitochondrial mentbr lo-lo-hi-hl 

448072 AI459305 Hs.24908 ESTs lo-lo-hHil 

436860 HI 2751 Hs.5327 PR01914 protein lo-lo-hl-hi 

448770 AA326683 He.21992 likely orlholog of mouse variant polyade lo-lo-hi-hi 

428044 AA093322 Hs.301404 RNA binding motif protein 3 lo-lo-W-hl 

451468 AW503398 Hs.293663 ESTs, Moderately similar to 138022 hypot lo-lo-hl-hl 

440278 BE560870 Hs.9052 ESTs, Weakly similar to 2004399Aohromo6 lo-lo-hi-hi 

441102 AA973905 intermediate filament protein syncollln lo-lo-hi-hi 

423942 AF209704 Hs.135723 glyoolipid transfer protein lo-lo-hl-hl 

425254 U91985 Hs.105658 DNA fragmentation factor, 45 l<D, alpha p lo-lo-hl-hl 

409324 W76202 Hs.343812 lipoic acid synthetase lo-lo-hi-hi 

431707 R21326 Hs.267905 hypothetical protein FU10422 lo-lo-hi-hi 

423335 AB018337 Hs.1 27287 KIAA0794 protein lo-lo-hi-hl 

429200 AA447871 Hs.194215 ESTs, Weakly similar to 138022 hypothetl lo-lo-hi-hi 

429898 AW1 17322 Hs.42366 ESTs lo-lo-hi-hi 

409604 AW444448 Hs.49124 ESTs lo-lo-hi-hi 

431797 BE169641 Hs.270134 hypolhelical protein FU20280 lo-lo-hi-hi 

437576 BE514383 prolhymosin, alpha (gene sequence 28) lo-lo-hi-hi 

415992 C05837 Hs.145807 hypothetical protein FU 13593 lo-lo-hi-hi 

458537 W24704 Hs.54773 ESTs lo-lo-hi-hl 

417665 AW852858 Hs.22862 ESfS lo-lo-hl-hi 

422292 AI815733 Hs.114360 transforming growth factor bets^timulat lo-lo-hi-hi 

421501 M29971 Hs.1384 O-e-methylguanlne-DNAmelhyltransferase lo-lo-hl-hi 

457952 U25750 Human chromosome 17q21 mRNAdone 1046:1 lo-lo-hl-hi 

414630 BE410857 Hs.16064 gb:BD1 301 1 77F1 NIH_MGC_21 Homo sapiens c lo-lo4il-hi 

421990 T31811 Hs.110480 DC12protein lo-lo-hl-hi 

404956 C1003210-.gil6912582lr8i]NP_036524.1|pe lo-lo-hi-hi 

436829 AW297958 Hs.1 631 09 ESTs lo-lo-W-hi 

402106 AK002178 hypothelicalprotein FU11316 lo-lo-hi-hi 

404384 N^L020832*•i^omo sapiens ATPase,H{+Mra lo-lo-hi-hi 

445123 AI762911 Hs.145369 ESTs lo-lo-hi-hi 

401757 Target Exon lo-lo-hl-hl 

439502 AA836672 Hs.130694 ESTs lo-lo-hi-hi 
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400111 

405446 AI015709 
401563 
402786 

426484 AA379658 Hs.272759 

414343 AL036166 Hs.323378 

421970 AF227156 Hs.1 10103 

422592 BE081857 Hs.94211 

413431 AW246428 Hs.75355 

426746 J03626 Hs.2057 
400237 



448622 AL046508 
400501 

452324 W81486 Hs.58648 

453146 AI338952 H3.32194 

430445 AWB92432 Hs.65307 
401750 

435236 T03890 Hs.157206 

400375 NM_014115 

412151 AA100529 Hs.286232 

410498 AA355749 

405044 

413169 AW161061 Hg.62954 
402101 

455019 AW850818 

446826 AK000626 H3.16230 

412180 AW89a791 Hs.1 18837 

407273 AJ1 32560 

452895 BE389229 Hs.30954 

416117 H19480 Hs.268787 

430934 AI792302 HS.248U1 

416309 R84694 Ha.79194 

444578 T80795 Hs.1 93702 



405435 

422694 C06003 Hs.23782 

422912 AW405973 Hs.11637 

412748 BE083158 Hs.10862 
403704 

440507 H06994 
405503 

456123 R00602 

454261 AF216077 Hs.48376 

458956 BE220675 

418367 AA326035 Hs.59236 

444553 AI157530 Hs.149380 

429461 AI188219 Hs.99311 

423378 BE313601 Hs.164866 

458516 BE010749 Hs.255097 



454148 
412678 
449298 
405525 
424576 
451601 

434333 
413509 
419604 



421911 
407813 
425211 
442772 
419733 

428260 AW290BBS Hs.86999 
427083 NII4.006383 Hs.173497 



Eos Control 

Homo sapiens mRNA; cDNA DKFZp586l2022 (f 
C150012S2:gi|7304981HNP_038528.1|ca 
CI 000887':gi| 1 2732453|ref|XP_01 1 474.1| C 
KIAA1 457 protein 
coated vesicle membrane protein 
RNA polymerase I transcription factor RR 
rcdl (required for celi differentiation, 
ubiquitin-conjugatlng enzyme E2N (liomolo 
uridine monophosptiafe syntlntase (orofat 
NM_001087*;Homo sapiens anglo-associaled 
Target Exon 
Target Exon 
ESTs 

Nlul_014080:Hamo sapiens dual oxidase-like 
.270607 ESTs, Wealdy similarto STK2_HUMAN SERIN 
ENSP0000C251912":KIAA1617 protein (Fragm 



NI^012448*:Homo sapiens signal transduce 
ESTs, Hghly similar to ARX MOUSE HOItlEOB 
NM_014115*:Homa sapiens PROD1 13 prolan 
Homo sapiens cDNA: FU23190 fis, clone L 
gb:EST64459 Juri<al T-cells VI Homo sapie 
MM.014630*:Homo sapiens KIAA0211 gene pr 
ESTs, Wealdy similar to zinc finger prot 
ENSP00000217725':Laminin alptia-1 ctian p 
gb:IL3-CT0220-091199-026-A03 CT0220 Homo 
hypottielical protein FLJ20619 
gb:CMO-NN0075-130400-332-f06 NN0075 Homo 
gb:Homo sapiens mRNA for Im 
phosptiomevalonate kinase 
ESTs 



444850 AW4448B2 Hs.1 48483 ESTs 



C17000574:gl|8923190|rBflNP_06017ai|iiy 



Target Exon 
Target Exon 

hypothetical protein FU 12847 
ESTs 

Homo sapiens cDNA: FLJ2331 3 fis, clone H 
Target Exon 

gb:yl81i307.r1 Scares Infant brain INiB H 
C7000609*:gl|628012|plr||A53933 myosin i 
gbve74c04.r1 Soaes fetai liver spten 
Homo sapiens clone HB-2 mRNA sequence 
gb:ht98f11.x1 NCLCGAP_Lu24 Homo sapiens 
hypothetical protein OKFZp434L0718 
ESTs 

NM_024810;Homo sapiens hypothetical pro! 
ESTs, Weakly similar to HSJ2_HUMAN DNAJ 
hypothetical protein FLJ22558 



AW732837 Hs.42390 
AA1 15575 Hs.1 14914 
AI911333 Hs.171689 



Hs.96833 ESTs 



N M_002439*:Homo sapiens mutS (E. ccli) h 



oentrosomal protein 1 
DKFZP434BC335 protein 
stromal cell protein 

gb:IL5-HT0198-291099-009-E01 HT0198 Homo 



448586 AF285120 Hs.283734 



UWm Hs.1674 

AA001021 Hs.6685 
AA343729 

AI432652 Hs.42824 

NM_001523 Hs.57697 

AL1 20445 Hs.77823 
AL041520 

AL1 20247 Hs.4010g 

I\fl18667 Hs.1867 

AW50368Q Hs.5967 

AW362955 Hs.224961 



thyroid hcrrr»ne leceptor interactor 8 
gb:EST4g73C Gail bladder I Homo sapiens 
hypothelical protein FU10718 
hyaluronan synthase 1 
hypothelloal protein FU21343 
gb;DKFZp434G2317_s1 434 {synonym: htes3) 
KIAA0872 protein 
progastricsin {pepsinogen C) 
Homo sapiens done 24416 mRNA sequence 
Homo sapiens cDNA FU14415 lis, done HE 
ESTs, Weakly similarto S65657 alpha-1 C- 
Sec23 (S. oerevisiae) homolog B 



io^hi-hl 
lo-lo-hMii 
io-lo-hl-W 



lo-lotl-lii 
lo-lo-hi-hi 
lo-lo-hi-hl 
lo-lo-hl-W 
lo-lo-hi-hl 
k>.io4il-lil 



lo-lo-hi-hl 
lo-lo-hi-hi 
lo-lo-hi-hi 



b-lo-hl-hi 
lo-lo-hi-hi 
io-lo-hl-hl 
lo-lo*i-hi 
lo-lo-hi-hi 
ki-lo-hi-hi 
lo-lo-hi-hi 



lo-lo-hi-hi 
io-io-hi-hl 
lo-lo-hi-hi 
lo-lo-hi-hl 
lo-lo-hi-hi 



lo-lo-hi-hl 
lo-lo-hi-hi 
lo-lo-hi-hi 
lo-io-hl-hl 

lo-lo-hi-hl 
to-lo-hi-hi 
lo-lo-hl-hi 
lo-lo-hi-hi 
to-lo-hi-hi 
Wo-hi-hi 



lo-lo-hi-hl 
Mo-hl-hi 
lo-lo-hi-hi 
lo-lo-hi-hl 
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418583 


AA604379 


Hs.86211 


hypoUietical protein 




407355 


AA846203 


Hs.193974 


ESTs. Weakly similar to ALUIJUSMN ALU S 


lo-lo-hi-hi 


454003 


AA05B944 


Hs.116602 


Homo sapiens, clone IMAGE:4154008. mRNA, 


lo-lo-hl-hi 


425322 


U63630 




protein l<inase, Dl^-activated, calalylic 


lo-lo-hi-hi 


402240 






Target Exon 


lo-lo-hl-hl 


421867 


AA48107a 


Hs.1 09045 


hypothetical protein FLJ10498 


lo-lo-hl-hl 


408603 




Hs.326416 


Homo sapiens mRNA; oDNA DKFZp564H1916 {f 


lo-lo-hi-hi 


437389 


AL359587 


Hs.271586 


hypothetical protein DKRp762M115 


lo-lo-hi-hi 




AF091Q35 


Hs.184627 


KIAA01 18 protein 




400277 






Eos Control 


lo-lo-hi-hi 


400995 






C11000295*:gi|12737279lref|XP_012163.1l 




400818 






Target Exon 










C1001899':gi|12722636|reflXP_010672.1|e 


lo-lo-hl-hi 


403708 






Target Exon 




405610 






ENSP00000241065*:CDNA 


lo-lo-hi-hi 


414242 


AA749230 


Hs.26433 


dolichyl-phosphale (UDP-N-acelylgiucosani 




420757 


X78592 


Hs.99915 




lo-lo-hi-hi 


400965 






C1 10021 9Q':gi|1273727g|ref|XP_0121 63.1| 


lo-lo-hi-hi 


401192 






Target Exon 


lo-lo-hi-hi 


404407 






Target Exon 




401405 






Target Exon 


lo-to-hi-hl 


403055 






C2002219*:gl|12737280|reflXP_006682.2|k 


lo4o-hl-hi 


404661 






C9000306*:gi|1Z737280|ref|XPJ06682.2| k 


k>-la-hl-M 


433627 


AF078866 


Hs.284296 


Homo sapiens cDNA: FU22993 f s. clone K 


k>-la-hi-hi 


410204 


AJ243425 


HsJ26035 


early growth response 1 


k)-k>-h>-hl 


432642 


BE297635 


Hs.3069 


heat shock 70kD protdn 9B (mDrtalin-2) 


lo-b-hi-hi 


400769 






Target Exon 


to-lo-hi-hi 


433980 


AA137152 


Hs.236049 


phosphoserine aminotransferase 


lo-lo-hi-hi 










lo-lo-hi-hi 




AA156164 


Hs.286241 


protein kinase, cAMP-dependent, regulato 


lo-lo-hi-hi 


422614 


AI908006 


Hs.295362 


Homo sapiens cDNA FLJ14459 fis, clone HE 


lo-lo-hi-hi 








NM_00651 3*;Homo sapiens seryl-tRNA synth 










NM_004g30*;Homo sapiens capping protein 


lo-lo-hi-hi 


452049 


BE268289 


Hs.27693 


peptidylprolyl isomerase {cyclopiiilln)-i 


lo-lo-hi-hi 






Hs.6838 


ras homolog gene family, member E 


lo-lo-hi-hi 


428770 


AK001667 


Hs.1 931 28 


hypothotioal protein FLJ10805 


lo-lo-hi-hi 


428403 


AI393048 


Hs.326159 


leucine rich repeat {in FLU) inleractin 




434647 


W74158 


Hs.103189 


llpopolysaccharide specific response-68 


lo-lo-hi-hi 








ENSP00000235229:SEMB. 




413992 


W26276 


Hs.1 36075 


RNA,U2srrall nuclear 


lo-lo-hi-hi 


407191 


AA608751 




gb:ae56h07.s1 Sfralagenelung carcinoma 


lo-lo-hi-lo 


403328 






Target Exon 


lo-lo-hi-hi 


411984 


NM_005419 


Hs.72988 


signal transducer and actlvakir of trans 


lo-lo-hi-lo 


451017 


BE391847 


Hs.181173 


hypothetical protein lWGCia771 


lo-lo-hi-hi 


404108 






C700091 1':gl|4235142|gblAAD14470,1l (AGO 


kj-MI-hi 


407819 


R42185 


Hs.102720 


ESTs 


k)-lo4ii-hi 


435876 


AW612S86 


Hs.160271 


Q protein-coupled receptor 48 


to-lo-hl-lo 


436716 


AI43354a 




gb:tl69g05j(1 NCLCGAP_Kld11 Homosaplen 


lo-lo-ht-h! 


401419 






Target Exon 


lo-lo-hl-hi 


424363 
408866 


AW512144 
AW292096 


HSJ46947 
Hs^036 


ESTs, Weakly sMar to A48B09 carboxyle 


lo-lo-hl4il 
b-k>-hM 


415516 


F11411 




ESTs 

gb:HSC2WF081 nomiallzed Infant brain cDN 


lo-lo4il4iI 


423144 


AW851527 


Hs.253677 


ESTs, Weakly similar to 138022 hypotheti 


lo-M-hl 


452560 


BE077084 


HS.9996S 




lo-lo-hi-hl 


439827 


AA846538 


Hs.1 87389 


ESTs 


lo-lo-hl-hl 


419709 


AA255592 


Hs.347973 


ESTs, Weakly similar to altemafively sp 




413672 


BE1 56536 




gb;QV0-HT0368-3101C(H)91-h10 HT0368 Homo 


lo-lo-hi-hi 


425291 


AA3S4572 




gb:EST62857 Jut1<at T^slls V Homo saplen 


lo-lo-hl-hl 


427403 


AA402107 


Hs.257146 


ESTs, Moderately sMar to 138022 hypot 


lo-lo-hi-hl 


430911 


AW937461 


Hs.255377 


ESTs 




435293 


AI040777 


Hs.117170 


ESTs 




448490 


AI523897 


Hs.271692 


ESTs, Weakly similar to 138022 hypotheti 


io-io-hi-hl 


449539 


W80363 


Hs,58446 


ESTs 






AW978811 


Hs.314461 


ESTs, Weakly sWlar to ALU1_HUMAN ALU S 


lo-lo-hi-hl 


459407 


N92114 




gb;za22h 1 1 .r1 Soares fetal liver spleen 


lo-lo-hi-hi 


423231 


AA323486 


Hs.271273 


Home sapiens cDNA PU12335 lis. ctone MA 


.to-lo-hi-hi 


450628 


AW382BB4 




ESTs 


lo-lo-hMii 


411690 


AA669253 


Hs.136075 


RNA,U2 small nuclear 


lo-lo-hi-hi 


414739 


U83867 


Hs.77196 


spectrin, alpha, non-erythrocylic 1 (alp 


lo-to-hi-hi 


444169 


AV648170 


Hs.58766 


ESTs 


lo-lo-hHii 






Hs.100293 


0-llnked N-acetylglucosamlne (GteNAc) tr 




422195 


AB007903 


Hs.113082 


K1AA0443 gene product 


lo-k>hi-hl 


452704 


AA027823 


Hs.149424 


Homo sapiens PNAS-130 mRNA. complete cds 


lo-loJiHil 


425074 


AA495930 




Homo sapiens cDNA: FU221 65 fls, clone H 


lo-lo-hl-hl 


426376 


N46752 


HS.3029S5 


ESTs 


lo-lo-H-hl 




AW073310 


Hs.1 63633 


Homo sapiens cDNA FLJ14142 lis, done MA 


lo-lo-hi-hl 


413686 


AI469213 


Hs.71404 


ESTs 


lo-lo-hi-hi 


449000 


U69560 


Hs.3826 


l(elch-like protein G31P1 


lo-toJiI-hl 


430064 




Hs.231436 


hypolhelfcal protein FLJ20084 


k>-lo*Jii 


412205 


N33818 


Hs.20274 


ESTs, Weakly similar to unnamed protein 


lo-lo4ii-hl 


423955 


AI420582 


Hs.136164 


cutaneous T-cell lymphoma-associated turn 


lo-lo-H-hi 


456619 


BE063853 




gb£lV3-BT029fr011299-022-g09 BT0296Homo 


lo-kj-hi-hi 
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4m22 fMSTBSO Hs.298102 ESTs 

459710 AI701596 Hs.121692 ESTs 

417918 AA209205 Hs.163754 hypothelicsl protein FU12606 

f^^^* NHL022095*:Homo sapiens hypothelical C2H 

424387 AI739312 Hs.284163 ANKHZN protein 

427220 AF069517 Hs.173993 RNA binding mollf protein 6 

410451 BE0S5687 gb:RC3-BT0316-270400-015-t10 BT0316Homo 

fj>™ NM_005165*;Homosapiens nuclearfactorr 

407218 AA095473 Hs.28505 ubiquitin-conjugafing enzyme E2H (honioto 

449312 N71673 Hs.223ee6 ESTs 

419612 AI498267 HS.11C613 KIAA0421 protein 

455272 BE148152 gb:RC4-HT0231-Q41l99^)12-b04HT0231 Homo 
fj^l^ NM_005177^HoinosapiensATPase,H+trans 

440422 AW452696 HS.13C760 myosin phosphatase, laigel subunit 2 

436819 AA731746 Hs.120232 ESTs 

413644 BE15491C Hs.278793 ESTs. Weakly similar to 2195JHUMAN ZINC 

4 3939 AL047051 Hs.199961 ESTs, Weakly sWar to AUU7_HUMAN ALU 8 
448 98 BE622100 ^.209406 ESTs, Weakly simflar to 138600 zlhclihg 
450488 AA009999 Hs.59159 ESTs, Moderately sWar to HPV16 El pro 
433507 AIB17336 Hs.191791 ESTs 

438996 AW748336 Hs.110613 KIAA0421 protein 

I^S?? f'^^^ "'-"^^3^ ESTs,Weal(lysliiiilartoALU7_HUMANALUS 

407251 U67611 ttansaldolaae 1 

5 ^^Sl? gbzn04d03.r1S(talagenehNT neuron (937 
409123 M063403 gbziii04d12.s1 Stratsgeneoomeal stroma 

^^f^^" ^-^^ ESTS, WeaWy similar 10 PC4259fen18h 

433735 AA608955 Hs.109653 ESTs 

434404 AW445034 Hs.256578 ESTs 

446667 BE161878 Hs.224805 ESTs 

447982 H22953 Hs.1 37551 ESTs 

438890 AA827756 Hs.135049 ESTs, Weakly similar to ALU7 HUlWAN ALU S 

427882 AA640987 Hs.193767 ESTs 

459680 H96982 Hs.42321 ESTs 

416632 H69480 Hs.141304 ESTs 

453876 AW021748 Hs.110406 ESTs, Weakly similar to 138022 hypothetl 

414528 AA148960 Hs,186836 ESTs 

419902 AA804409 Hs.118920 ESTs 

409542 AA503020 Hs.36563 hypothetfeal protein FU2241 8 

433560 A/92S195 Hs.130891 hypothelloal protein MGC4400 

AAiAnn M.jos^roA . ■■ 1,5,3 

0h™H73fl5 

412156 H29487 Hs.17110 
414505 R45389 
404277 

414662 AL036058 Hs.76807 

444430 A1611153 "" 
445612 N94126 



Hs.12969 



403740 

411034 T18987 

429143 AA333327 

443060 D78B74 

422749 W01076 



gb™d73fl2j<1 NCLCGAP_Lu24 Homo sapiens 
Honro sapiens mRNA; cDNA DKFZp434C2016 (( 
ESTs, Weakly sWIarto A48042 lysosomal 
NM.019111':Homo sapiens major histocompa 
major hisiocompalibilily complex, class 
Homo sapiens cDNA: FU22783 lis, ckjne K 
hypothetical protein 

ENSPC0000251563';UDP-giucufonosyItrans(e 
NM_C01076":Homo sapiens UDP glycosyto 
Hs,125472 ESTs, Moderately similar to KiAA0877 pro 
Hs,197335 plasma giutamate carboxypeplidase 
Hs.8944 procoliagen tendopepiidase enhancer2 
Hs.278573 CD59 antigen p18-20 (antigen identified 
429441 AJ224172 Hs.204096 lipcphiiin B (uteroglobin family member) 
414382 AW38C339 Hs.8068 hematopoietic PBX-interxtin'g protein 
441560 F13386 Hs.7888 Homo sapiens clone 23736 mRNAsequence 
446106 AA377165 Hs.44833 ESTs 

452239 AW37g378 Hs.170121 prolein tyrosine phosphatase, receptor t 
446874 AW968304 Hs.56156 ESTs 

412795 BE241753 Hs.74692 special AT-nch sequence binding protein 

430325 AF004562 Hs.239356 syntaxin binding pnitein 1 

426392 AWg68324 Hs.17384 ESTs 

447448 BE244285 F-box only protein 29 

415743 AA167664 Hs,14333 ESTs, Weakly similar to Z195_HUMAN ZINC 

4316C7 AB033097 Hs.183669 KIAA1271 protein 

411979 X85134 Hs.72984 retlnoblastoma-binding protein 5 

453620 BE396163 Hs,25005 ESTs, Weakly similar to AUJ5JHUMAN ALU S 

431099 Y13367 Hs.249235 phospholnoslllde^kinase, dass 2, alph 

^HSI hypothetical protdnMGC14797 

439565 AF086386 Hs.145599 ESTs 

442349 W40516 Hs.132355 Homo sapiens cDNA:FU221 19 lis, clone H 
410096 AW245200 Hs.267400 hypothetical protein MGC5540 
429447 AW812452 Hs.83286 ESTs, Weakly similar to S14747 sphingomy 
431802 AL133570 Hs.270571 Homo sapiens mRNA; cDNA DKFZp434L201 (fr 
441715 Ai929453 Hs.342655 Homo sapiens cDigAFU13289 lis. clone OV 
458230 BE311851 Hs.6639 KIAA1624 protein 
428788 AF0B2283 Hs.193516 &<KllCLtJlyraphoma 10 
450818 AI740573 Hs.1 42827 P311 protein 
419576 AK002060 Hs.91251 hypolheScal prolein FLJ11198 
400401 AF159093 Homo sapiens endogenous retrovirus RANI 



io-to-hi-hi 
io-k)4ri-hi 
to-to-W-hi 
lo-b-hr-M 



io-lo-hi-lo 
io-io-hi-lo ■ 
io-lo-hi-to 

lo-k>4il-to 
k>-loJiHo 
to-lo-hi-lo 
h-lo-hl-lo 
lo-lo-hi-to 
Io-lo-hi-to 



to-lo-hHo 
to-lo4ii-to 
Io-lo-hi-lo 
lo-lo-M-lo 



to-lo4ihto 
lo-k)4il-lo 
lo-lo-hi-k) 
lo-lo-hl-b 
loWii-b 
loJo-hi-b 
lo-lo-hl-b 
lo-b-hl-to 
lo-lo-hMo 
lo-lo-hi-to 
lo-lo-hi-to 
lo-lo-hi-to 
io-lo-hi-to 
lo-to-hi-to 
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423749 U0984B Hs.1 32390 

42B898 AB033070 H5.194408 

45B258 AW406546 Hs.127971 

429521 BE048708 Hs.50949 
402185 

415961 H10983 Hs.155919 

457255 AB023212 Hs.225967 

412419 AW948630 

438397 AA806478 , Hs.1 23206 

440509 BE410132 Hs.1 34202 

423895 AA332215 



445094 AW296163 Hs.147296 ESTs 



ESTs, Weakly similar to T17279 hy| 
gb;EST36124 Embiyo, 8 week 1 Homo sapien 
NM_004651':Homosa| 



432323 AK001409 Hs.274356 

444290 AA262496 

435803 Z44194 

436905 N31273 
401849 



406180 AB018249 

448176 A)672546 Hs.170507 ESTs 

409259 AW608930 Hs.52184 

457335 AW969834 Hs.303303 

452444 BE144022 



hypothelica! protein FU10547 
gb;2s20ni.r1 NCLCGAP.GCBI 1 
transducer of ERBB2, 2 



Target Exon 

C19000553':gi|12741444|fEllXP_008888.2| 
I cytokine subfamily A (Cy 



hypotheticel protein FU20618 



439944 AAB5B767 1 

411283 AW852754 

458195 R10085 I 

452654 BQ)04783 

425684 AF000989 Hs.159201 

429452 AI949495 Hs.1 33998 

431709 AF220185 Hs.267923 

411701 BE181659 

430729 AI572560 Hs.301283 

447476 BE293466 Hs.20880 

450436 AW293661 Hs.131887 
405365 

419656 AA244416 

446103 U90918 Hs.13804 

400986 

424194 BE245833 Hs.1698S4 

400210 

400234 



gb;MFO-HTQ165-19119»W4-1D5HT0165 Home 
Target Exon 
gb:aa33b03.r1 NCLCGAP^GCBI Homo sapiens 
ESTs 

gb:PM1-CT0247-18010(MX»c05 CT0247 Homo 
ESTs 

gb:II^F<2-BN0114-270400me11 BNOIMHomo 
Uiymosin,bela4,Ycliromosome 
Heme sapiens cDNA FU13202 fis, done NT 
■ ■ ■ 1 hypothala ' " — 



428181 AA423976 

456629 AW891965 H6.279789 

426940 AA393537 Hs.98347 

433555 AA535902 Hs.1 46211 

421431 AA550117 H 

448631 AI554923 

433521 T66087 H 

407187 AA446971 

450739 A1732707 H 

440004 BE397117 H 

403947 NM_CC5032 

405529 AW41C458 

404663 
400220 
401444 



428666 
428442 
440151 
431046 
443914 
402469 
418155 
446893 
442336 
421290 
450374 
402347 
415184 
415632 
423718 



AA863167 

AW854382 Hs.249126 

AI0gi173 Hs.222362 

R45481 Hs.23719 

AI610818 Hs.7110 

AW34Cg58 Hs.7572 

NM_014368 Hs.103137 

AA397540 Hs.60293 

AA380436 Hs.211973 

U67085 Hs.78524 

AL1 19520 Hs.180737 



l-191-g07HT0638Homo 

KIAA0793 gene product 

ESTs, Weakly similar to 138022 hypotheti 

ESTs 

CX001212*:gl|7861932|gb|AAF70445.1| (AF2 
gb;nc07cl11.s1 NCI.CGAP.Prl Homo sapiens 
hypothetical protein dJ462023.2 
NM_024085*:Homosapi9is liypothetlcal pro 
gb:TCBAP1E1903 Pediatric pre-B cell acut 
Eos Control 

NM_005336;Homo sapiens high density lipo 
NM_005336:Homo sapiens high density lipo 
NM.022170':Homo sapiens Wlfflams-Beiren 

C16000922;gl|7499103|pii1|T209C3 hypothe 

gb.'zv62ft06.s1 Soaiies_teslls_NHT Homo sap 

histonedeacetylaseS 

ESTs, Weakly sMartoJC5308tesfwp 

Homo sapiens HERC2P7 pseudogene, partial 

ESTs 

gb:te53h12,x1 Soares_NFLT_GBC_S1 Homos 

Homo sapiens unknown mRNA sequence 

gb:2w86f11.s'1 So8res_tateLfetusLNb2HF8_ 

ESTs, Weakly simHarto ALU7JHUtMN ALU S 

hypothetical pmtelnHJ21 845 

plastin3(TisDform) 

chmmosome 11 open leading fi'ame2 

C19001075*:gi|4567179|gb|AAD23607.1|AC00 

ENSP0O00C251884;KIAA1521 protein (Fragme 

Eos Control 

Target Exon 

gb:MR0-HT0164-191199^fi)3HT0164Homo 

Eos Control 

Homo sapiens mRNA; cDNA DKFZp434A1014 (f 
Homo sapiens mRNA; cDNA DKFZp434A202 (fr 
ESTs 

gb.'ak38ea7.s1 Soanes_lestis_NHT Homo sap 
Homo sapiens done 24894 mRNA sequence 
ESTs, Weakly similar to p40 [H.saplens] 
Target Exon 

ESTs, Weakly similar to 138022 hypotheti 

ESTs 

ESTs 



homolog of Yeast RRP4 (ribosomal RNA pni 
TcD37homolog 

Homo sapiens done 23664 and 23905 mRNA 



la.k>hl-lo 
lo-lo-lii-to 
lo-lohi-lo 

lo-MI-lo 



lo-lo-hi-Io 
lo-lo-hl-lo 
lo-ta-hi-lo 
toJo-W-lo 
k>-lo-hi-lo 
kj-lD-hl-lo 
lo-k>-hl-lo 
k>4o-hl-lo 
lo-b-hl-lo 
fc)-lo-hl-lo 
lo-lo-hi-lo 
lo-Io-hi-lo 
lo-lo-hi-lo 



to-lo-hi-lo 
to-lo-hHo 
kMo-hi-lo 
lo-lo-hi-lo 
to-lo-hi-lo 
lo-lo-hi-lo 



to-lo-hi-lo 
b-b-hi-to 
to-lo-h'hio 
to-lo-hl-to 
b-lo-hl-lo 
lo-lo-H-lo 
lo-lo-hl-to 
lo-lo-hHo 



to-lo-hi-to 
lo-lo-hi-to 
lo-lo-hi-lo 
to-lo-hi-to 
lo-lo-hi-lo 
lo-lo-W-lo 
to-lo-hi-lo 
lo-lo-hi-lo 
to-lo-hl-to 
to-lo-hi-lo 
to-lo-hl-lo 
b-lo-hl-to 
to-b-hl-to 
lo-lo-hl-lo 
to-lo-hiJo 
lo-lo-hi-to 
to-lo-hi-lo 
to-to-hi-to 
to-to-iii-to 
to-b-hl-to 
lo-lo-hi-to 
to-lo-hl-to 
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449140 AW013840 

431241 M496799 

416631 H69466 

424168 L29277 

401600 BE247275 

420588 AF000982 

414111 BE047679 

417138 AA193646 

424318 AA476515 

455653 BE154075 

451493 H38656 

457015 AA6B8058 



Hs.321677 

U5 snRNP-speciiic protein, 1 1 6 kD 
Hs.147916 DEAD/H (Asp-Glu-Ala-Asp/His) box polypep 
Hs.152982 hypothetical protein FU13117 
Hs.65771 Homo sapiens chroniosonie 19, BAC CIT-HSPC 
Hs.1 72723 ESTs 

gb:PM0-HTO339-2004(XHI10-EO5 HT0339 Homo 



lo-lo-hi-lo 



451445 
464775 
411053 

435312 



427327 
444321 
405109 
450182 



AL134878 

AI277883 

AI749893 

AI129066 

AA0176C9 

BE16022S 

AWB15061 

AJ243396 

AK000724 

H61899 

AW501456 

AW204210 

N47812 

AI796400 

AU076896 



Hs.146141 
Hs.270532 
Hs.135457 
Hs.343449 



H5.4865 

Hs,301563 

Hs.171937 



428772 
423759 
434350 
442274 
442884 
400481 



AI524039 
AI142358 
AL042940 
AI733484 
AI076570 

I T51008 

I AW291S72 

; BE045344 

i AA179948 

i R07114 



ribosomal protein, laiigeP2 

ESTs 

ESTs, Wealtly similar to 138022 hypotheti 
ESTs 

gbze37e01.r1 SoaiGsrBtlnaN2b4HRHo[no 
gb;CW1-HT0413-090200-062-a12 HT0413 Homo 
gb;CMO-ST0209-271099^)82-d1 0 ST0209 Homo 



Hs.240767 
Hs.154095 
Hs.194718 

Hs.192524 
Hs.1 84361 
Hs.93872 



karyopherin alpha 6 (Import'n alpha?) 
steroid dehydrogenase-like . 
Homo sapiens cDNA: FU22355 fis, clone H 
Homo sapiens mRNA; cDNA DKFZp564N1623 (f 

CGI-35 protein 

Human DMA sequence fian clone RP1-12G14 
zinc linger pfotdn 143 (clone pHZ-1) 
zinc finger pioteln 265 

NU_a21 1 36':Homo sapiens zona peOucida g 



rs, Moderately similErtoAUJ7_HUAMN A 



Hs.1 34053 ESTs 

Hs.258981 
Hs.274923 
Hs.175563 1 
Hs.271224 
Hs.93836 
Hs.101383 



407949 
440296 
422260 
434685 
412657 
405188 



435184 
431475 
445239 
436151 



435351 
403218 
420678 
445808 
429933 



417314 
428290 
422128 
432014 



N58790 
W21874 
D30829 



T67162 

AI557669 

AI217375 

AK000801 

AI523875 



AW517236 
U89277 
AW959165 
AA478883 
T80177 
AL1 34878 



Hs.268820 

Hs,247057 ESTs, Weakly similar to 2109260A B cell 
Hs.180610 splicing factor praline/glutamine rich ( 
Hs.105484 regenerating gene type IV 
Hs.287467 Homosapiens cDtvlA FU11948fis, clone HE 

gb;EST388274MAGE resequences, MAGN Homo 

Target Exon 

gb:qh04c12.x1 Soares_NFL^T_GBC_S1 Homos 
Hs.58606 SNRPN upstream reading frame 
Hs.1 3804 hypothetical protein dJ462023,2 
Hs.135127 ESTs, Weakiy similar to unnamed protein 
Hs.40342 putative nuclear protein 
Hs.1 70023 ESTs, Weakiy similar to CA36_HUMAN COLLA 
Hs.324271 Homo sapiens cDNA FU20794 Ss, done CO 

gb:tg97d04j(1 NCI_CGAP_Cai Homosapiens 
Hs.323502 Homo sapiens cDNA: FU23539 fis, clone L 
Hs.159337 ESTs 
HS.33CT62 ESTs 

Hs.305985 eaiiy development legidatar 1 (homolog o 
H&270034 Homo sapiens. Similar to nudear locallz 



similar to rat nuclear ubiquitous casein 

ribosomal protein, large P2 

AW59328B Hs.3530 TLS-associaled senne-arginine protein 2 

AV655234 ESTs, Moderately similar to PC4259 feni 

AA765596 Hs.1 87691 ESTs 

AA250950 Hs.1 54334 ESTs 

W26522 Hs.75890 gb:32g2 Human retina cDNA randomly prime 

N68168 gbzallcOLslSoaiesfetalliverspieen 

AI932995 Hs.183475 Homo sapiens clone 25061 mRNA sequence 

AW881145 gb:QV0-OT0033-010400-182-a07 OT0033 Homo 

H66741 Hs.38540 ESTs, Weakly similar to ALU4_HUMAN ALUS 



lo-lo-hi-lo 
Io-lo4ii-lo 
to-lo4il-lo 
lo-lo-hl-lo 
lo-lo4ii-lo 

lo-lo-hi-lo 



lo-k)4ii-lo 
iD-lo-hl-lo 
lo-lo-hHo 
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407351 


AW383165 




gb;PM3-HT0344-1512994X)4-f07 HT0344 Homo 


Mo-hi-lo 


443231 


W87548 


Hs.132932 


ESTs 


lo-lo-hHo 


444001 


AI095087 


Hs.1 52299 


ESTs, Moderately similar to 865657 alpha 


lo-io+l-lo 


435064 


T70740 


Hs.31433 


ESTs 




435173 


AW295645 


Hs.255451 


ESTs 


Wo-hi-lo 


411831 


AW994394 




gb:RC3-BN0036^)60400-014-li12 BN0036 Homo 


lo-io-hi-lo 


446572 


AV659151 


Hs.282961 


ESTs 


Io-lo4ii-lo 


428114 


A1821648 


Hs.98363 


ESTs, Weahly similar to 138022 hypotheli 


io-io-hi-lo 


406207 






Target Exon 




405011 






Target Exon 


lo-lo-hi-io 


409451 


AF012525 


Hs.54472 


fragile X menial retardation 2 


io-lo-hi-lo 


411233 


AW833793 




gb:QV4-TT0008-130100-080-a06n0008 Homo 




455729 


BE072092 




gb;PM4-BT0532-160200O03-b11 Br0532Homo 


lo-lo-hi-io 


439454 


AA836120 


Hs.258958 


ESTs 


lo-lo-hi-io 


445124 


AI8064C3 


Hs.1 43942 


ESTs 


lo-lo-hi-io 


410324 


AW292539 


Hs.30177 


ESTs 




446548 


AI769392 


Hs.200215 


ESTs 




416999 




Hs .21122 


tiypottietical protein FLJ1 1830 similar to 




414553 


AI813865 


Hs.164478 


hypotiietical protein FLJ21 939 similar to 


io-lo-hi-io 


444647 


HI 471 8 


Hs.1 1506 


Human clone 23589 mRNA sequence 


lo-lo-hi-io 


418271 


NM.000919 


Hs.83920 


peptldyiglycine alpha-amldating monooxyg 


■ io-io-hi-lo 


407939 


W05608 


Hs.312679 


ESTs, WeaMy simlar to A49019 dynein lie 


loJo-hHo 


432678 


AI187366 




Bb.qf29c01.x1 Soares^testlsJIHTHcraosap 


k>Jo.|iHo 


415156 


XB4908 


H6.78060 


ptiOGphorylass kinase, beta 


ioMI-lo 


432679 


AI1469S6 


Hs.146723 


ESTs, Weakly similar to A53950 transcrip. 


io-io-hi-lo 


412121 


AB033061 


Hs.73287 


KIAA1235 protein 


io-ioJii-lo 


418858 


AW961606 


Hs.21145 


liypolhelr'cal protein RG083M05.2 


io-lo-hhio 


425204 


NM_002436 


Hs.1 861 


membrane protein, palmitoylated 1 (55kD) 


lo-io-hi-io 


418348 




Hs.96322 


tiypothetlcai protein FU23560 


lo-lo-hi-lo 


410765 


AI694972 


Hs.66180 


nucieosome assembly protein l-like 2 




445594 


AW058463 


Hs.1 2940 


zinc-fingers and homeoboxes 1 


lo-lo-hi-lo 


416503 


H98502 


Hs.269853 


ESTs 


lo-io-hi-lo 




AF039023 


Hs.157496 


IRAN binding protein 6 


Io-io-hi-lo 


451752 


AB032997 




KIAA1 171 protein 


lo-lo-hi-lo 




AW976438 


Hs.1 7428 


RBP1-lii<e protein 


lo-lo-hi-io 


419872 


AI422g51 


Hs.1 461 62 


ESTs 


lo-to-hi-lo 


443161 


A1C38316 




gb:cx48c08.x1 Scares totaljetus Nb2HF8 


lo-io-hi-io 


445391 


T92576 


Hs.191168 


ESTs 


lo-lo-hl-lo 


443801 


AW2C6942 


Hs.253594 


intron of; triohorhinoptialangeal syndnj 




446706 


AW807631 


Hs.190488 


Homo sapiens, Similar to nudear iocailz 


lo-lo-hi-lo 


428172 


U09367 


Hs.182828 


zinc finger protein 136 (done pHZ-20) 




421021 


AA303018 


Hs.1 09302 


ESTs 




431749 


AL049263 


Hs.306292 


Homo sapiens mRNA; oDNA DKFZp564F133 (fr 




423784 


AK000039 


H6.132826 


Homo sapiens oDNA FLJ14913 lis, done PL 


lo-io-hi-io 


419479 


AI2ee34B 


Hs.23450 


mikichondiial ribosomal protein 825 


lo-lo-hi-lo 


450900 


H61005 


Hs.37902 


ESTs 


lo-lo-hi-io 


423396 


AI382555 


H6.127950 


bromodomain^nlaining 1 


k>-l(>4ii-la 


426137 


AL040683 


Hs.1 67031 


□KFZP566D133 protein 


k>l!>-hi-lo 


442012 


AI733277 


Hs.128321 


ESTs 


lo-lo-hl-lo 


452271 


AA025976 


Hs.34569 


ESTs 


io-Wii-lo 


414882 


D79994 


Hs.77546 


Homo sapiens cDNA: FU21983 fs, done H 


lo-to-hi-lo 


432195 


AJ243669 


Hs.8127 


KIAA0144gene product 


lo*-hl-lo 


430217 


N47863 


Hs.1 30450 


ribosomal protein S24 




429567 


R35606 


Hs.326800 


Human EST done 53123 mariner transposon 


io-lo4ii-lo 


438810 


AW897846 


Hs.6421 


hypothetical protein DKFZp761NC9121 


lo-lo-hi-lo 


436796 


BE515260 


Hs.5320 


hypothetical protein 




426352 


N72324 


Hs.55098 


ESTs 


lo-lo-hi-lo 


416308 


F05251 




gb:HSC04H101 nonnalized Infantbrain cDN 


io-lo4ii-lo 


420148 


U34227 


Hs.95361 


myosin VIIA (Usher syndrome 1 B (autosoma 


ioJo^ii-lo 


434442 


AA737415 


Hs.1 52826 


ESTs 


k)-Io-hi-lo 


449429 


AA054224 


Hs.59847 


ESTs 




410245 


CI 7908 


Hs.1 941 25 


ESTs 


lo-lo-hi-io 


421168 


AF1 82277 


Hs.330780 


cytochrome P450, subfemlly IIB (phenobar 


lo-lo-hi-lo 


436237 


R11528 


Hs.271968 


ESTs 


io-lo-hi-io 


440668 


AI989538 


Hs.191074 


ESTs 


lo-lo-hi-lo 


422068 


AI807519 




Homo sapiens oDNA FLJ13694 fis, clone PL 


lo-io-hi-lo 


410216 


BE051839 




gb:RC1-BT0254-29010C-01 5-a05 BT0254 Homo 




439437 


AI2C7788 


Hs.343628 


sialyltransferase 4B (beta-galactosidase 


lo-lo-hi-lo 


417061 


AI675944 


Hs.1 88691 


Homo sapiens cDNA FU12033 fc, clone HE 


to-lo+i-lo 


403046 






NM_005656".Homo sapiens transmembrane pr 


ki-MI-lo 




AI912555 




peptide YY, 2 (semlnalplasmrn) 




439734 


AC005013 


Hs.149 


cAMP response element-binding protein CR 


k>lo-hl-lo 


462997 


N64777 


H6.44656 


ESTs 


k>4o-hi-lo 


403745 






ENSP0CC00226812*:KIAA1494 protein (Fragm 


to-lo-hi-io 


411448 


AA1 78955 


Hs.271439 


ESTs, Weakly similar to 138022 hypotheli 


Wo-hHo 


422460 


AW445014 


Hs.l 97746 


ESTs 


to-lo-hi-to 


404058 






Target Exon 


k)-lo-hi-lo 




BE1 54067 


Hs.136660 


ESTs, Weakly similar to ZN91_HUMAN ZINC 




427702 


N75589 


Hs.14454 


ESTs, Weakly similar to TFIID subunllTA 


iolo-hi-lo 


440695 


AW088363 


Hs.246240 


ESTs 


kvMi-lo 


424881 


AL1 19690 


Hs.153618 


HCGVIIl-1 protein 


lo-lo-hl-hi 


440573 


BE550891 


Hs.270624 


ESTs 


ioJo-hi-W 
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20 
25 



75 
80 



416659 


W22048 


Hs.64753 


gb.'61A12 Human retina cONATspSOOUIeav 


lo-lo-hl-hl 


436731 


AA5B0691 


H5.180789 


S164 protein 


to-Mi-hi 


405102 






C15001220*:gil44695581gblAAD21311.11(AF 


lo-to-hl-hi 


450219 


A1826999 


Hs.224624 


ESTs 


lo-lo-hi-hi 


404527 


AI912555 




peptide YY, 2 (seminalplasrain) 


lo-lohl-lii 


439158 






ESTs 


lo-lo-lii-lii 


431952 




Hs.272240 


Homo sapiens oDNA FU1 1086 fis, clone PL 


lo-lo-hi-hi 


418584 


NM_004606 


Hs.1179 


TATA box binding protein (TBI^^-associate 


lo-lo-hl-hl 




AW995948 


Hs.1 82339 


Homo sapiens pyruvate deh/drogenaseldna 


lo-lo-hi-hi 


410124 


AW962229 


Hs.128927 


Homo sapiens cDNA FU13903 lis, clone TH 


lo-lo-hi-hi 


435955 


AAB30515 


Hs.222917 


ESTs 


lo-lo-hi-hi 


424001 


W67883 




paternally expressed 10 


hi-hi-lo-lo 


441399 


A1630844 


Hs.126919 


ESTs 


hi-hi-io-lo 


440184 


AB002297 


Hs.7022 


dedicator of cyto-kinesis 3 
glucagon 


hi-hi-lo-io 






Hs.1460 


hi-hi-lo-lo 


444252 


R21135 


Hs,54985 


ESTs 


hr-hi-lo-lo 


402082 






C18000743*:gil6678363ireflNP 033416.111 


hl-hi-lo-lo 


405396 






C22000452':gi|6981522|rBf|NP_a36781.1i r 


hi-hl-to-lo 


412457 


T32587 


Hs.170414 


paired basic amino acid cleaving system 


hi-hi-lo-lo 




R21439 


Hs.334578 


Homo sapiens, clone IIWGE-.3929520, mRIMA 


ht-hl-lo-lo 


441494 


AW452344 


Hs.129977 


ESTs 


hi-hUo-lo 


437330 


AL353944 


Hs.50115 


Homo sapiens raRN^• cDNA DKFZp761J1 1 12 (1 


hl-hl-lo-lo 


452784 


BE463857 


Hs.151258 


hypotlietical protein FU21 062 


hl-hl-loJo 


410037 


AB020725 


H3^8009 


KIAA0918 protein 
ESTs 


hi-hi-lo-lo 


449145 


AIG32122 


Hs.1 98408 


hi-hl-lo-lo 


452487 


AW207659 


Hs.6630 


Homo sapiens cDNA FU1 3329 fis, done OV 
ESTs 


hi-hi-io-lo 


431031 


AA830335 


Hs.105273 


hi-hi-lo-lo 


427209 


H05509 


Hs.92423 


KIAA1566 protein 


hi-hl-lo-lo 


434280 


BE005398 




gb-X;M1-BN01 16-150400-189..h02 Big0116 Homo 


hi-hi-lo-lo 


418236 


AW994005 


Hs,337534 


ESTs 




429201 


X03178 


Hs,1 98246 






416653 


AA768553 


Hs.1 931 45 


metallotlibnsln1E(functioiial) 




422501 


AA354690 


Hs.144967 


ESTs 




425087 


R62424 


Hs.1 26059 


ESTs 


hi-hl-lo-lo 


426798 


AA385062 


Hs.1 30260 


ESTs 


hi-hl-lo-lo 


443798 


R07848 


Hs.188522 


ESTs 




427254 


AL121523 


Hs.97774 


ESTs 


hi-hl-lo-lo 


431667 


AI345227 


Hs.1 06448 


ESTs, Wealily similar to B34087 hypotlietl 




409963 


M1335S0 


H^'ussf 


calcium/calmodiilin-dependent protein liln 


hi-hi-io-lo 


446006 


NM_004403 




deafness, autosomal dominants 


hi-hi-io-lo 


418269 


AA216404 




ESTs 


hi-hl-to-lo 


410173 


AA706017 


Hs.1 19944 


ESTs 


hi-hl-lolo 


436023 


T81819 


Hs.302251 


ESTs 


hi-hi-to-b 


448428 


AF282874 


H5.21201 


nectin 3; DKFZP566B0846 protein 


hl-hi-lo-to 


430665 


BE350122 


Hs.157367 


ESTs, Wealdy similar to 178885 seiine/th 


hi-hl-lo-lo 


432559 


AW452948 


Hs.257631 


ESTs 


hi-hl-to-b 


451572 


AA018556 


Hs.268691 


ESTs, Moderately similar to ALU2JHUMAfi) A 
ESTs 


hi-hi-Wo 


456032 


AWg57446 


Hs.301711 


hi-hiJo-lo 


438209 


AU20659 


Hs.6111 


aryl-hydrocarbon receptor nuclear ttansi 


hi-hi-loJo 


438337 


AK002058 


Hs.6166 


liypotlielical protein FLJ11196 


hl-hi-Wo 


431795 


AK002088 


Hs.270124 


Homo sapiens cDNA FU 1 1 226 fis, done PL 




421114 


AW975051 


Hs,293156 


ESTs, Weakly similar to 178886 serine/lh 


hi-hi-lo-lo 


431843 


AA516420 




ESTs, Weakly similar to 138022 tiypotlieti 






AW188311 




ESTs 




430105 


X70297 


Hs.2540 


cholinergic receptor, nicotinic, alpiiap 


hl-hi-Io-Io 


439046 


AA947354 




gb:od86e1l5l NCi_CGAP_0v2 Homo sapiens 




451491 


AI972094 


Hs.286221 


Homo sapiens cDNA FU13741 Us, clone PL 


hl-hl-lo-lo 


452789 


AW081626 


Hs.242561 


ESTs 


hi-hi-lo-lo 




AI924228 


Hs.1 15185 


ESTs, Moderately similar to PC4259 leni 




449567 


AI990790 


Hs.188614 


ESTs 


hi-hi-lo-lo 




N21307 


Hs.1 3477 


ESTs, Weakly sirailarto 1207289A reverse 


hi-hi-lo-lo 


409091 


AW970386 


Hs.269423 


ESTs 




435354 


AA678267 


Hs.117115 


ESTs 


hi-hi-Wo 


444809 


BE207568 


Hs.208219 


oculospanin 


hi-hi-loJo 


422170 


A1791949 


Hs.1 12432 


anli-lt^ulleriai hormone 


hi-hi-lo-to 


453582 


AW854339 


Hs.33476 


hypothetical prolan FLJ11937 


hl-hi-lo-lo 


435905 


AW997484 


H3.5003 


KIAA0456 protein 


W-hiWo 


443884 


N20617 


Hs.194397 


lepUn receptor 


hl-hMo 


430027 


AB023197 


H6,227743 


KiAA0980 protein 


hi-hl-lo-lo 


432582 


A1623B17 


Hs.168457 


ESTs 


hi-hHo-lo 




AV\/963705 


Hs.301183 


molecule possessing ankyrin repeats indu 




444930 


BEt85536 


H6.301183 


molecule possessing ankyrin repeats indu 
ESTs 


hi-hl-to-io 


427794 


AA709186 


Hs,99070 


hi-hi-lo-lo 


410913 


AL050367 


Hs,66762 


Homo sapiens mRNA; cDNA DKFZp554A025 (fr 


hl-hi-lo-lo 


431992 


NM_002742 


Hs.2891 


proteinkinaseC, mu 


hi-hi-lo-lo 


447846 


AA324057 


Hs,77955 


Homo sapiens cDNA; FLJ23527 fls, clone L 


hi-hl-Io-Io 


430439 


AL133561 




DKFZP434BC61 protein 




432621 


AI298501 


Hs.12807 


ESTs, Weakly similar to T46428 liypotheti 


hl-hi-lo-lo 


431427 


AK000401 


Hs,252748 


Homo sapiens cI3NA FU20394 lis, ctone KA 


hi+i-lo-Io 


408872 


A)476139 


Hs,13291 




W-lii-lo-lo 


45320C 


AA033832 


Hb.212433 


ESTs 


hi4ii-lo-lo 


411529 


AA430348 


Hs.317596 


Homo sapiens cDNA FU12927 fis, clone 


hi-hi-to-to 
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440049 R06599 Hs.19769 hypothetical protein MGC41 74 hi-lil-lo-lo 

429483 AA974832 Hs.1 28708 ESTs hl-lil-lo-lo 

411296 BE207307 Hs.10114 growth suppressor 1 hl-hi-lo-lo 

425188 AK002052 Hs.1 55071 hypothetical protein FU11190 hi-hi-lo-lo 

436315 BE390513 Hs.27935 hypothetical protein MGC4837 hi-hi-lo-lo 

400297 AI127076 Hs.306201 hypothetical protein DKFZp56401278 hi-hi-lo-lo 

431089 BE041395 ESTs, Weakly similarto unknown protein hi-hi-lo-lo 

418824 AW751661 Hs.53542 choreoacanthocytosis gene; K1AA0986 prot hi-hi-lo-lo 

449226 AB002365 Hs.23311 KIAA0367 protein hi-hi-lo-lo 

450149 AW969781 Hs.132863 Zic family member 2 (odd-paired Drosophi hi-hi-lo-lo 

418443 NM_005239 Hs.85146 v-ets avian erylhroblastosis vims E26 o hi-hi-lo-lo 

458692 BE549905 Hs.231754 ESTs hi-hi-lo-lo 

410102 AW248508 Hs.279727 ESTs; homologue of PEM-3 [dona savignyi hi-hi-lo-lo 

451062 AL110125 Hs.25910 Homo sapiens mRNA; cDNA DKFZp564C1416 (f hi-hi-lo-lo 

407633 NM_007C69 Hs.37189 similarto rat HREV1 07 hi-hi-lo-lo 

418941 AA452970 Hs.239527 El B-55kDa-assooiated protein 5 hi-hi-lo-lo 

407059 X95406 gb;H.sapiens cycirn E gene. hl-hi-lo-lo 

455956 BE162704 gb:PM1-HT0454-30129W)01-d08 HT0454Hoiiio hi-hi-lo-lo 

437763 AA469369 Hs.5831 tissue inhibitor of metaloproteinase 1 hl-hl-lo-lo 

451404 M460775 Hs.8295 ESTs. WeaMy similarto T1724BhypDlheti hl-hi-lo-lo 

428494 AA233439 H6,184634 hypolhBtioal protein aj20005 hi-hi-lo-lo 

414957 D61283 Hs.45206 ESTs hl-hi-lo-lo 

456415 AI734051 Hs.277102 ESTs. Wealdy similarto ALU1_HUIlMN ALUS hl-hHo-lo 

400183 Eos Control hi-hi-lo-lo 

400158 ENSP00000244302'.CDNAFU11591lis.ctan hl-hl-lo-lo 

403893 ENSP000fl0237058*:Pnotocadhetlnalpha6p hl-hl-lo-lo 

423809 A1223833 Hs,1 54483 ESTs hi-hHo-lo 

400170 EosContral hi-hi-lo-lo 

403291 Target Exon hi-hi-lo-lo 

422026 U80736 Hs.1 10826 trinucleotide repeat containing 9 hi-hi-lo-lo 

417130 AW276858 Hs.81256 S100 calcium-binding protein A4 (calcium hi-hi-lo-lo 

432472 AA548781 Hs.136418 ESTs hi-hi-lo-lo 

405231 C2001066;gi|10257425|ref|NP_0338921|CD hi-hi-lo-lo 

400141 Eos Control hi-hi-lo-lo 

428971 BE278404 Hs.285813 hypothetical protein FLJ11807 hi-hi-lo-lo 

422390 AW450a93 Hs.l21830 ESTs, Weakly similar to T42682hypothefi hi-hi-lo-lo 

425538 BE270918 Hs.1 64026 Homo sapiens, clone IMAGE:3534875, mRNA, hi-hi-lo-lo 

456972 A1054347 H6.2017 ribosomal protein L38 hi-hi-lo-lo 

456622 AF205849 Hs.107740 Kiuppel-lte factor 2 (lung) hl-hi-lo-lo 

418515 AI568453 Hs.19487 ESTs. Weakly similar to CNIH_HUMAN CORNI hl-hl-to-lo 

448439 BE613082 HS.2B229 ARG99 protein hi-hi-loJo 

445418 AW139377 Hs.127179 crypdcgene hl-hl-Wo 

402559 Z23024 Rho GTPase activating protein 1 hl-hl-Mo 

402676 Z23024 Rho GTPase aotivating protein 1 hl-hi-Wo 

420811 AAe07544 ESTs, Weakly similar to B34323GrP-blndl hl-ht-M 

446627 AI97301S Hs.15725 hypothetical protein SBBI48 hl-hl-lo4o 

400247 Eos Control hi-hl-lo-lo 

430269 AK001952 Hs.23B039 hypothetical protein FU 11 090 hi-hi-lo-lo 

400133 Eos Control hl-hl-lo-lo 

418816 T29621 H6.88778 carbonyl reductase 1 

433579 BE264473 Hs.284297 hypothetical prolein from EUROIMAGE 1967 

401952 Target Exon 

410349 AW663021 Hs.323445 ESTs, Weakly similar to T2D3J1UI«AN TRANS 

417558 AF045229 Hs.82280 regulator of G-protein signalling 10 

446861 AW007332 Hs.1 0450 Home sapiens cDNA: FU22063 lis, clone H 

404489 Target Exon 

405802 Target Exon 

456266 L29073 Hs.198726 cold shock domain protein A 

457133 M54968 v-Ki-ras2 Kirsten rat sarcoma 2 viral on 

459330 C16931 gb:C16931 Clontech human aorta pclyA mRI<l 

433041 BE265848 Hs.289080 colon cancer-associated protein Mid 

446545 A1431798 Hs.164192 ESTs, Weakly similar to Y151_HUIiMNHYP0T 

414911 NIVL000107 Hs.77602 damage-specific DIW binding protein 2 (4 

414682 AL021154 Hs.76884 InhibitorofDNA binding 3, do 

422311 AF073515 Hs.114948 cytokine receptor-like factori 

447329 BE09051 7 ESTs, Moderately similar to ALU8_HUMAN A 

412942 AL1 20344 Hs.75074 mitog en-activated protein kinase-actival 

420747 BE294407 Hs,99910 phosphofhKtokinase. platelet 

431912 AI660552 Hs.76549 ESTs, Weakly similar to A561 54 Ablsubst 

446506 AI123118 Hs.15159 chemoklne-like factor, altemaliyelyspl 

408633 AW963372 Hs.46677 PRO2000 protein 

433676 AW977653 Hs.75319 ribonucleotkie reductase M2 polypeptide 

424560 AA158727 Hs.150555 protein predicted by clone 23733 

425234 AW152225 Hs.165909 ESTs, Weakly similar to 138022 hypothetl 

439816 AA206079 Hs.6693 hypothetical protein FU20420 

410174 AA306007 Hs.59461 [DKFZP434C245 protein 

410442 X73424 Hs.63788 prapionyl Coenzyme A carijoxylase, beta p 

429190 H18650 Hs.92602 ESTs 

423619 T48691 Hs.249159 adrenergic, alpha.2A-, receptor 
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AW753676 Hs.39982 

R74441 Hs.117176 

AF151879 Hs.26706 

BE245374 Hs.27842 

AI963747 Hs.18573 

AW1 63267 Hs.1 06469 

NM_014641 Hs.277585 

NM 003816 Hs.2442 

BE262677 Hs.283558 

AW075485 Hs.286049 

AL1 36877 Hs.50758 

A1469788 Hs.165190 

AW342140 Hs.182545 

T24968 Hs.23038 

AI654133 Hs.30212 

AL133353 Hs.16606 

NM_003542 Hs.46423 

AW408163 Hs,58488 

AW939251 Hs.25647 

AA460479 Hs,321707 

AL1 17424 Hs.25035 

AW956103 Hs.51712 

AI432163 Hs.268231 

NM_001674 Hs.460 

T16206 Hs.237164 

AI738719 Hs.1 98427 

BE250162 Hs.83765 

X54942 Hs.33758 

H73444 Hs,394 

AA016188 Hs.111244 

AI752235 Hs.41270 

AW968613 Hs,79428 

AW582256 Hs.9l011 

AA305599 Hs.238205 

F13272 Hs.l 11334 

AA347720 Hs.122669 

AA339801 H6.239138 

AA421404 Hs.346668 

408089 H59799 Hs.42644 

409690 W45393 Hs.55888 

442332 AI693251 Hs.8248 

408388 AF091086 Hs.44563 

441252 AW360901 Hs.183047 

433069 X76732 Hs.3164 

443837 A1984625 Hs.9884 



452092 
447425 
421654 
432502 



439574 
438182 
449103 
421059 
446939 
408576 
410073 
450912 
434701 
450455 
451144 
427390 
451831 

428157 

418203 
449338 
422082 

416655 

434094 
443951 
422975 
430314 
412664 



441181 
447397 
427505 
430287 
415857 
423198 
407687 
431374 
413273 
442799 
443881 



S.121076 



439453 BE264974 



421921 
422938 
427719 
422283 
424840 
418216 
412140 



poly(A)-binding protein, nuclear 1 
CGI-121 protein 
hypotiietical protein FU11210 
acyipliospiiatase 1, ervllirocyte (coramon) 
suppressor ol varl {S.cerevisiae) 3-llke 
KIAA01 70 gene product 
adisintegrinandmc 
tiypottielical protein PR61855 

SMC4 



ESTs 

ESTs, Weakly sinflar to ALU1_HUMAN ALU S 
HSPC071 protein 

thyroid receptor Interacting protein 15 

CGI-32 protein 

Whistone family, member G 

calenin [cadherin-associated protein), a 

v-fDS FBJ murine osteosarcoma vital onco 

KIAA0742 protein 

cliloride Intracellular channel 4 

pyruvate dehydrogenase kinase, isoenzyme 

Homo sapiens cDNA; FU23111 fis, clone L 

activating transcription factor 3 

ESTs, Highly similar to LDHH.HUMAN L-LAC 

hexokinase2 

dihydrofolate reductase 

CDC28 protein kinase 2 

adrenomeduilin 

hypothetical protein 

procoilagen-iysine, 2-0)(oglutara1e 5kCo 
BCL2/adenovirus E1B 19kDHntetacting pro 
anterior gradient 2 (Xenepus laevis) hom 
hypothetical protein PRO2013 
ferritin, light polypeptide 



AA622037 
AA416925 

BE247676 Hs.18442 

AA361562 Hs.178761 

AW182459 Hs.125759 

AAB66115 Hs.127797 

M81933 Hs.1 634 

AK002011 Hs.37558 

BE258532 Hs.251871 

U75679 Hs.75257 

AI564739 Hs.68505 ESTs 

R64512 Hs.237146 

AA235776 Hs.79078 

BE543205 Hs.288771 



activating transcription factor 7 

Target CAT 
hypothetical protein 
hypothetical protein MGC4399 
nucleobindin 2 
spindle pole body protein 
programmed oeB death 5 
pepUdylpralyl isomerase (cyclophilln)-l 
E-1 enzyme 

26S proteasome-assoclated pad1 homolog 
ESTs, Weakly simiiar to LEU5_HUI*kN LEUKE 
Homo sapiens cDNA FU1 1381 fis, clone HE 
celi division cycle 25A 
hypothetical protein FU 11 149 
CTP synthase 

stem-loop (histone) binding protein 



.J12752 



Hs,75616 
Hs.25199 
Hs.6566 



AF098158 Hs.9329 

; C18825 Hs.29191 

I Z17805 Hs.93564 

I T55979 Hs.1 15474 

1 BE267931 Hs.78996 

1 AW975531 Hs.154443 

I U45258 Hs.339665 

I NM_014214 Hs.5753 

i AC002563 Hs.15767 

. AF129535 Hs.272027 

BE276112 Hs.7165 

H83363 Hs.6820 

NIVL001809 Hs.1594 

AI393122 Hs.134726 

AW411307 Hs.114311 

D79987 Hs.153479 

AA662240 Hs.283099 

AA219691 Hs.73625 



hypothetical protein FL 
MAD2 (mitotic arrest deficient, yeast, h 
DKFZP586A0522 protein 
kinesin-iike 6 (mitotic centromere-assoc 

hypothetical protein 

thyroid hormone receptor interactor 13 

pituitary tumor-transfbrmlng 1 

3 20 open reading frame 1 



Homer, neuronal Immediate early gene, 2 
replfcation fector C (activator 1) 3 (38 
proliferating cell nuclear antigen 

" ideffcient(S. 



inositol(myo)-1 (or 4)-monophosphatase 2 

citron (rho-interacting, sc'--'"-— '- 
F-box only protein 5 
zinc finger protein 7" 



hi-to-lo-hi 
hVto-lo-hi 
hi-lo-lo-hi 
hl-lo-la-hi 



hi-lo-lo-hi 
hi-lo-k^hi 
hi-k)-lo-hi 
hl-to-lo-H 
hl-b-io-hl 
hi-b-lo-hi 
hHoJo-hl 
hlWo-hi 
hHo-lo-H 
h'hto-io-hl 
hi-lo-lo-hi 



hi-lo-lo-1 



hi-lo-lo-hi 
hi-lo-lo-hi 
hi-lo-lo-hi 



hi-lo-lo-hl 
hi-io-ki-hi 
hi-lo-lo-hl 
hi-lo-k)-hi 
iii-lolo-hl 
hi-io-to-hi 
hi-lo-lo-hi 
hi-b-lo-hi 



hi-lo-lo-hi 
hi-lo-lo-hi 
hi-lo-lo-hl 
hi-lo-lo-hl 
hi-k)-io-hl 
hi-lo-lo-hi 
hi-lo-lo-hi 
hl-io-lo-hl 



CDC45 (cell division cyde 45, S.cerevis 
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41B322 AA284166 H5.84113 cyclin-dependent kinase Inhibitor 3 (CDK iii-lo-lo-iii 

428479 Y00272 Hs.334562 oeli division cycle 2, G1 to S and G2 to lil-lo-b-iil 

449722 BE280074 Hs.23960 cyciinBI lil-lo-kj-iii 

417933 X02303 Hs.82962 tliymldylatesyntlietase lii-lc-b-lii 

433001 AF217513 Hs.279905 cloneHQ0310PRO0310p1 lii-lc-lo-lil 

413943 AW294416 Hs.144687 Homo sapiens cDNA FU12981 fis, ctone NT lii-lo-b-lii 

424905 NM_002497 Hs.153704 NIfM (never in mitosis gene a)-felatedi< iil-lo-b-iii 

422765 AW409701 Hs.1573 t)aculov{ral lAP repeat-containing 5 (sur iil-lo-to-iii 

425397 J04088 Hs.156346 topoisomerase (DNA) II alplia (1701(D) hi-lo-lo-lil 

444371 BE54D274 Hs.239 1ori<liead tmx Ml lii-lo-lo-lii 

422956 aE545072 Hs.122579 ECT2 protein (Epithelial cell transfomii hi-lo-lo-hi 

444783 AK001468 Hs.62180 anilln (Drosophila Scraps homolog), act hl-lo-lo-lii 

453884 AA355925 Hs.35232 KIAA0186 gene product hi-lo-lo-hi 

416980 AA381133 Hs.80684 high-mcbill^ group (nonhistone chromoso hl-lo-lo-lii 

442432 BE093539 Hs.38178 hypothetical protein FU23468 hl-lo-lo-M 

417308 H60720 Hs.81892 KIAA01 01 gene product hi-lo-b-hi 

433133 AB027249 Hs. 104741 PDZ-binding Idnase; T-cell originated pr hi-lo-lo-lii 

432626 AA471098 Hs,278544 acetyl-Coenzyme A aoelyltransferasB 2 (a hi-lo-lo-M 

441020 W7g283 Hs.35962 ESTs hi-lo-lo-hl 

412281 AI81CC54 Hs.14119 ESTs hi-lo-lo-lii 

435602 AF217515 Hs.283532 uncharacterized bone marrow protein BM03 hi-to-lo-hi 

400882 Target Exon hi-Wo-hi 

446269 AW263155 Hs.14559 hypothetical protein FU10540 hi-Mo-hi 

417847 AI521558 Hs.7331 hypothetical protein FU2231 6 hMo-lii 

400881 NlkL025C80:Homo sapiens hypothetical prot hi-lola-hl 

419356 AI656166 Hs.7331 hypothetical pnjteinFU2231 6 hi-to-lo-hi 

400292 AA250737 Hs.72472 BMP-R1B hi-to-lo-hi 

415539 AI733881 Hs.72472 BMP-RIB hi-b-to-hi 

453935 AI633770 Hs.42572 ESTs hl-lo-lo-hi 

420005 AW271106 Hs.133294 ESTs hUo-lo-hi 

428450 NM_014791 Hs.184339 KIAA0175 gene product hUo-lo-lii 

436291 BE563452 Hs.344037 protein regulator of cytokinesis 1 hi-lo-to-lil 

441362 BE614410 Hs.23044 RAD51 (S. cereWsiae) homolog (E coli Re hl-lo-lo-hi 

428484 AF104032 Hs.134601 solute carrier family 7 (calionlc amino hi-lo-lo-hi 

418526 BE019020 Hs.35838 solute carrier family 16 (raonocarboxyllc hl-lo-lo-hl 

468809 AW972612 Hs.20985 sin3-assooia(ed polypeptide, 30kD hi-lo-lo-hi 

444984 H15474 Hs.132898 fatty acid desaturasel hi-lo-lo-hl 

447342 A1199268 Hs.19322 Homo sapiens, SlrallartoRIKEN CDIW2010 hl-hl-lo-lo 

428330 L22524 Hs.2256 matrix metalloprotelnase 7 (matrilysin, hi-hi-io-lo 

428336 AA503115 Hs.1 83752 mlcrosemlnoprotein, beta- hi-hi-lo-lo 

430389 AL117429 Hs.240845 DKFZP434D1 46 protein hl-hl-lo-lo 

417318 AW953937 Hs.240845 ESTs hi-hl-lo-lo 

422545 X02761 Hs.287820 Sbronecllnl hi-hl-lo-lo 

417640 D30857 Hs.82353 protein C receptor, endoBiellal (EPCR) hi-b-lo-lo 

422809 AK001379 Hs.121028 hypothetical protein FU10549 hi-b-lo-hi 

425580 L11144 Hs.1907 galanin hl-lo-lo-hi 

416836 D54745 Hs.8a247 cholecystoklnln hl-lo-to-hi 

434170 AA626509 Hs,122329 ESTs hl-lo-lo-hl 

427958 AA418000 Hs.98280 potassium Intennedlate/small conductance hl-b-lo-hi 

439706 AW872527 Hs.59761 ESTs, Wealdy similar to DAPIJIUMAN DEATH hl-loJo-hl 

450088 AW292933 Hs.254110 ESTs hl-Wo-hl 

414219 W20010 Hs.75823 ALL1-fusedgenefromchreimosome1q hi-loJo-hl 

419201 M22324 Hs.1239 alanyl (membrane) aminopeptidase (aminop hl-lo-to-hi 

426263 AI908774 Hs.259785 carnitine palmitoyltransferase I, liver hi-lo-b-W 

456236 AF045229 Hs.82280 regulatorofG-proteln signalling 10 hi-lo-to-hl 

456607 A1660190 Hs,1 06070 cyclin-dependent kinase inhibitor 1 C (p5 hi-lo-lo-hi 

408437 AW957744 Hs.278469 lacrimal proline rich protein hi-to-lo-hi 

421180 BE410992 Hs.258730 heme-regulated initiation factor 2-alpha hl-lo-lo-hi 

413437 BE313164 Hs.75361 genefrom NF2/meninglomaregion of 22q12 hi-lo-lo-hi 

432415 T16971 Hs.289014 ESTs, Weakly similar to A43932 mucin 2 p hi-lo-lo-hi 

449230 BE613348 Hs.211579 melanoma cell adheston molecule hi-b-to-hi 

417979 AU077284 Hs.83081 GTP cyclohydrolase I feedback regulatory hi-lo-lo-hi 

421877 AW250380 Hs.109059 mitochondrial ribosomal protein LI 2 hi-lo-lo-hi 

412482 AI499930 Hs.334885 mitochondrial GTP binding protein hi-lo-lo-hi 

428423 AU076517 Hs.184276 solute carrierfamily 9 (sodiumAiydrogen hi-lo-lo-hi 

422947 AA306782 Hs.122552 G-2 and S-phase expressed 1 hl-lo-lo-hi 

441072 AW275480 Hs.39504 hypothetical protein MGC4308 hi-lo-lo-hi 

415938 BE383507 Hs.78921 A kinase (PRKA) anchor protein 1 hi-lo-lo-hl 

432278 AL1 37506 Hs.274256 hypothetical protein FU23563 hl-lo-lo-hi 

446651 AA393907 Hs.97179 ESTs hi-Io-to-hi 

431515 NU.012152 Hs.258583 endothelial diflerenUah'on, lysophospha hl-lo-lo-hl 

445345 AW003850 Hs.12532 chromoscme 1 open reading frame 21 hi-lo-lo-hl 

458965 AA010319 Hs.60389 ESTs hl-Wo-hl 

438321 AA576635 Hs.6153 061-48 protein hi-loJo-hl 

416783 AA206186 Hs.79889 monocyte to macrophage dUferendatfen-a hi-Wo-hi 

453563 AW608906 Hs.1 81 163 hypothetical protein MGC5629 hHo-lo-hl 

432393 AW205863 Hs.1 33988 hypothetical protein FKSG28 hi-Wo-hi 

433914 AF108138 Hs.1 12160 Homo sapiens DNAhelicase homolog (PiFI) hi-Io-loJli 

414907 X90725 Hs.77597 poto (Drosophiaj-like kiiase hl-Wo-hi 

432375 BE536069 Hs.2962 S100 calcium-binding protein P hi-la-la-hi 

440773 AA352702 Hs.37747 Homo sapiens. Similar to RIKENcDNA 2700 hi-lo-lo-hl 

415994 UUJXSm Hs.78944 regulator ofG-protein signalling 2, 24k hi-Mo-hi 
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412722 
446839 
428862 
439108 
430178 
421733 
452410 
430132 
428297 
413142 
427239 

410748 
424506 , 
447333 
414761 , 



426006 
467465 
406867 
407230 



439186 
424544 
431325 
414922 
438291 
418574 
409342 
432734 
4360B7 
420309 
411619 
4243B1 
442547 
430376 
434666 
412330 
452123 
424893 
428057 
431566 
439979 
418836 
433757 



A1343300 Hs.15091 

BE091926 Hs.16244 

NM_000346 Hs.2316 

AW163034 Hs.6467 



M204686 
AA236291 
M81740 



Hs.234149 
Hs.1 33583 
Hs.75212 



Hs.303116 
rts.28988 
rts.22627 



mitotic spindle co9ed-coll related piot 
SRY (sex delemilning region Y)-box 9 (ca 
s/naptog/rln 3 
ESTs 

fibroblast growth factor receptor 3 (ach 

Homo sapiens mRNA; cDNA DKFZp434E2321 (f 

hypothetical protein FU20647 

serine (or cysteine) proteinase Inliibilo 

ornithine decarboxylase 1 

ubiquHn carrier protein 

Insulin Induced gene I 

chromosome 1 open reading frame 21 

group III secreted phospliolipase A2 

hypothetical protein dJ616B8.3 

enhancer of zeste (Drosophlla) liomolcg 2 

hypothetical pnjiein 

stromal cell<lei1ved fedor 2-ll<e 1 



BE300296 

AW043637 Hs.21766 

AI418609 Hs.71040 

AA2B5249 Hs. 1 46329 

AA306997 Hs.217484 

AV\(292053 Hs.1 2532 

AF151103 Hs. 1 12259 

NM_005100 Hs.788 

AI267615 Hs.38022 

AW295112 Hs.1 53648 

AI343641 Hs.1 85798 

AF176012 Hs.260720 

AW600291 Hs.6823 

AI655499 Hs.1 61712 

AI949974 Hs.1 52670 

AW067800 Hs.1 55223 

AW963419 Hs.155223 



ESTs 



AA157857 Hs.1 82265 

AJ003624 Hs.1 5896 

BE206854 Hs.46039 

AI697274 Hs.1 05435 

M8B700 Hs.150403 

AWC26751 Hs.5794 

D00723 Hs.77631 

BE514605 Hs.289092 
N23754 

AU077058 Hs.54089 

AA837396 Hs.263925 



pliosphoglycerate mutase 2 {muscle) 
GDP-mannose 4,6-dehydralaBe 
dopa decarboxylase (aromatic L-amIno aci 
ESTs, Weakly similar to 2109260A B cell 
glycine cleavage system protein H (amino 
Homo sapiens cDNA: FU22380 Us, clone H 
Ikl-phase phosphoptotein 9 
BRCAI associated RING domain 1 
LIS1-interactlng protein NUDE1 , rat homo 
CGI-133 protein 

ESTs, Wealdy similar to ALUS.HUMAN ALU S 
hypothetical protein FU20425 
protein kinase Chk2 

ESTs, Weakly similar to ALU1_HUMAN ALU S 

chromosome 1 open reading frame 21 

T cell receptor gamma locus 

A kinase (PRKA) anchor protein (gravin) 

ESTs 

Homo sapiens cDNA FU13303 fis, clone OV 



hypolhelical protein FU10430 



tii-lQ-lo-hi 
hi-lo-lo-hl 
hi-lo-M 



hi-lo-lo-hi 
hi-lo-lo-hi 
hi-lo-lo-hi 



hi-lo-kj-hl 
hUo-to-hl 
hi-lo-lo-lil 
hVlo-lo-lil 
hJ-lo-lo-W 
hi-lo-lo-lil 
hl-lo-lo-lil 



hi-lo-lo-hi 
hi-lo-lo-hi 
hi-lo-lo-hi 
hi-lo-lo-W 
hl-lo-lo-lii 
ht-lo-b-hl 
hl-b-lo-lil 
hl-Wo-lil 
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Pkey: Unique Eos probeset idenlifier numbei 
CAT number: Gene cluster number 
Accession: Genbanl( accession numbers 



Pkey CAT Number 
403660 107294J 
409051 109699J 



409123 110143J 

410216 1184664 1 

410451 1 20411 8J 

410498 120611J 

411053 1230446J 

411233 1236369J 

411283 1237666J 

411701 1254466J 

411331 1260400J 

412419 1293418_1 



413509 1374313_1 

413672 1382512_1 

415308 1533673_1 

415516 15391 85 J 

416631 1605019J 

416954 163427_1 

417314 1666649_1 

418056 171841_1 

418259 17338B_1 

418574 17690_1 



421974 209807.1 

422128 211994_1 

423028 224062_1 

423476 22861_1 

423895 233006.1 

424593 241234_1 

425074 246486J 

425291 249618.1 

425980 258778.1 

426413 266650.1 

428181 287953.1 

429163 300543.1 

429540 305828.1 

430068 312849.1 

430103 313089_1 

430439 31808 1 

431089 327825J 

431843 338324.1 

432079 341114 1 

432340 345248.1 

432676 352582J 

433075 35820.1 



434280 382816.1 

434609 38950.1 

435023 39809aL.1 

436716 425440J 

436862 42814.2 

437576 43892J 

438869 46661.1 

438882 466649^1 

438980 467544.1 

439046 468133.1 

439848 477806J 

440151 487109.1 

440507 495S77_1 



AA525775 AA056342 AI538978 AW975281 AA664986 

AA080912 AA075318 AA083403AA076594 AA078992 AA0B4926AA081881 AA113913 AA113892 AA083821 AA134801 AA082953 AA070343 
AA062835 AA075419 AA063293 AA071252 AA078900 AA062836 AW974306 
AA063403 AA070823 AA070050 



BE065687 BE065637 AW749002 H73690 
AA355749 AA085520 AW966333 AA34031 9 BE170936 
AW815061 H71965AW815072AW315048AW815041 AW815047BE152] 
AW333793 AW833799 AWa33346 AW833371 AW833795 AW833562 AW833667 AW833377 
AW852754 AW852897 AW852757 AW852617 BE172755 AW835444 
BE181659 AW890576 AW857638 

AW994394 AW865900 AW8659C5 AW865891 AW866014 AW865898 

AW943630 AW948626 AW948634AW948616AW948627AW948615 AW948631 AWg48605 AW948B11 AW948610AW948633 AW948623 

AW948628 AW948604 AW948602 AW948607 

AW962604AA368639AA1 12257 

AW976165CC4000 

BE086815 BE086823 1^1218 R6922g 

BE145419BE145433 

BE156536 BE156439 BE156700 BE156449 BE156653 BE156533 BE156524BE156670 BE156721 BE156723 

FQ5251 R13748 Z44028H14747 

F11411 R15237 Z43915 H20760 

R39769T53143 H60012 

H69466 H93884 N59684 

AI222358 N73390 D61648 AA243520 AA190953 

N68168 N69188 N90450 

AA524886AW971347AA211537 

AA21 5404 A1990909 B E4641 32 AW271 459 N74332 A1262061 

N28754 N28747 AI568146 AI979339 AA322671 AA322e72 AW955043 AI990326 AA776406 AI01 6250 AA843678 AW451882 N23137 N23129 

W70061 AI038748 AA831327 AI925845 AW945895 

AA244416AA244401 

AA807544 AA280648 AI243056 AI022744 AA705288 AA829425 AW4S2095 AI92931 7 R19039 AA282024 

AL041520AA300086 

AA301270AA301379AA301366 

AW881 1 45 AA49071 8 M85637 AA304575 T06067 AA331991 

H90946 AA320597 AW954970 BE143680 

AL035533 F1 1 794 F1 1 783 H 1 8042 T66089 H29379 R19493 AW134660 AI299437 AL133995 AA057405 N78357 AA9174S0 AI002692 T09262 

T65008 H29290AI200374AA894415AI732887AI791768A1733447AA988785 N62128T09261AW956936 

AA332215AA403110AW965299 

AA343729 AA345779 AA344370 

AA495930 A1470890 H97831 AA350358 BE166712 

AA354572AWaS2361 AW813419 AW816041 AI744949 

AA366951AA470999AA469425 

AA377823 AW9544g4 AI022688 

AA423976 AA437075 BE006469 

AA884766AW974271 AA592975AA447312 

U85776AA454535AA456208 H9018g 

AA464964MB5405AA947566 

AA465259AW897142AWB97144 

AL133561 AL041090AL117481 AL1 22069 AW439292A1968826 
BE041 395 AA491826 AA621 946 AA715980 AA666102 

AA516420C14818C14815C15161 C15068 D80763 D606S6AW970134AA543007 D81004 D60184AI498371 D60382 D60181 01 5876 

AW972746AA525323AI150314 

AA534222AA632632T81234 

AI187366AA553869AA618478 

NM.002959 X98248 AA233278 AA846376 AI470560 AI470533 BE327147 AW291971 AA017125 AI198417 A1365213 AI168442 AI337018 
AI475049 H35459 AASOgsgs AA888000 AA41 8326 AA41 8378 N71981 AL043634 AA426361 AA418275 AA232975 AL036B61 BE277220 BE387S05 
N9971 0 AW375004 AA418268 AL079651 H85743 AW902319 AW805907 AA984366 T92310 AA405425 AA421732 A1656841 AW300968 
AW5934ie T92267 BE464032 AW473548 AI359502 BE552306 AI990ig6 AW518351 AI239559 AW590963 AA01836g AI273737 AL042658 
AA411303 AA40281C H33111 AW013931 AW366432AW752435AW376124A1292020A1292121 AA340647 BE613672BE409874AA351915 
BE617026 BE019588 AW4C2692 AW247466 RS9233 AA134761 BE254019 BE265105 D63316 BE313080 BE547713 BE536578 BES46749 
AA324185 H17386 BE253377 R87598 H29072 AA350980 BEa76629 BE253g57AA532613 BE252486AW804459 030966 R8795g AA091832 
BE005398AA628622AA994155 
R76593AF147390 R76594 

AI6g2552 AI393343 AI300510 AI37771 1 F24263 AA661 876 
A1433540AA728984AA804981 

AI82ig40N67106AI744264AAB08846AA643417AA&43416 Z70715 

BE514383 AA071273 AW247987 AW673286 BE312102 AW749824 BE07ig85 AW577383 BE07ig45 BE072005 AW577356 BE071965 AW239231 
BE072000 BE071960 AW577360 AW749830 AW373020 X97303 AW999522 BE000192 BE562219 BE266655 BE264970 

AF075009 R6310g R53068 
AA827695 AA833754 AW978946 
AW502384 A1982587 AA828822 
AA947354AA82g660AI68729B 
AW97924g D63277AA846g68 
AAS68167 F21S58 F31418 F35624 
H06994BE1478g8 
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441102 509604_1 

442048 531432_1 

443161 561305_1 

444290 59994_1 

444314 600667_1 

445308 65133 1 

447329 71759_1 

447448 722246 1 

448150 7521 65_1 

448489 765247_1 

448631 772996J 

448738 77790_1 

452410 9163J 



454775 1234106_1 

455019 1249138_1 

455272 1271871J 

455653 1348742"l 

455729 1353792J 

455824 1372880.1 

465956 1387163_1 

456123 1534442.1 

457133 29066.1 



12BE546579AM21321 



458956 83645.1 



AA9739C5AI299333AA917019H63235 T90771 
AA9746C3AI984319AW340495 
AI038316AI344631 AI261653 
AA262496 AV648929 AA305356 D61644 D78724 
AI140497 AW749625 AW749626 AW749644 
AV655234 AW966332 AA340239 

BE090517 AW970792 AW264490AW0149B5 F27436AA947336F15843 H89338 A 
AW025245 

B E244285 C1 8429 H42373 A1820706 AI379786 R55439 AW276142 

AI472167AI99Q315R32175 
AI523875 R45732 R45781 
AI554923AI9C2356 
BE614081 W01988AW500790 

AL1 3361 9 AA4681 1 8 AA383064 A1476447 T09430 AI673758 AA524895 AI581 345 Al 300820 AW49881 2 AA2561 62 A! 559724 Al 685732 AA602400 
AA905453AI204595AW166541AA157456AA156269AA383652AA431072AW592707AI435410AW272464A1215594AA622747 R74039 
N35031AI804128AW513621 AA868351 AI026826AI493388AA614641 W81604AI567080 AI2U351 AA730140 A1125754 AI200813 AI269603 
AI565082 AI807095 AI475529 AA609909 AI368449 AI686077 AI582930 AW085038 AA767863 AA730154 AI767072 AA46831 6 AI7341 30 A17341 38 
AA426284AA433997 AI741241 AVW)43563A1732741 AI732734AA437369AA425B20 AA664048 R74130 
BE144022BE14396gBE143915 
BE(X)4783 BE004g47 AlSlirSO 

BE160229 AW819879 AW820179 AW819882 AW819876AW820169 BE153201 AW993736 BE152911 

AW85081 8 AW850833 AW851100 

BE148152 BE148133 BE148159BE148132AW835107 

BE063853 BE063955 BE063866 BEQ63705 BE063846 BE061416 BE063844 

BE154075 BE153973 BE064861 BE153852 BE153847 BE064684 BE153602 BE065075 BE154018 BEC64772 BE064842 BE153557 BE153509 

BEC72092 BE0721C6 BE072086 BE072093 BE072103 
BE143703 BE143631 BE143629 BE143702 
BE162704 BE1627C5 BE162732 BE162702 BE162694 
R00602 Z42921 F06132 

M54968 NM.004985 AI808924 AL1 351 30 AW242010 AA476a48 AI740449 M17087 KC3210 M35505 M35504 L00049 AI1B65B5 W35273 X016e9 
X02825W23635 AI55492CAI539465AA425263AI469981 W21091 T28976AW977922 BE5501B0AW664973AI148939AW1 17295 AA81 1229 
AI343010AA766141 BE219368 N95249AA230396AW504574AA232870AI770018AA262948AW4S0230AW362890AW6C9417AW499941 
AA425857 AW380665 AA830647 AA282180 T27356 H85307 AA861543 AA356548 AA356410 AWB60656 AW860647 AW938103 AW860649 
AI56701 6 N70374 AW474707 AA605084 AA082195 AW949515 AA361728 N33863 AA411 821 AA401640 AW694461 AL120766 AI500024 
AW771 891 H84567 D51551 AA330460 R1 41 84 AI301629 N64676 AV659669 AI697660 AI004579 AA287927 AW453052 AW601 642 AA676681 
AA737010 AA872481 AA281094AA564243 BE464958 BE049265 AW167917 AA843916 AA525301 AI015987 N25230A1B89481 AW173466 
AA937541 AI334416 AI676214 AI281159 M553669 AA582189 M255527 AW1S0515 AA670007 H08199 AA80B271 AA281015 W47527 AA649252 
AI364302 AA889246 R40473 H02312 AA54B1 1 6 AA342730 M2';3624 R99351 R415B8 R49696 AAB54442 F01713AA2136B5 AA721296 R79B33 
H84241 R70668 H85554 AA223758 N95349 AI374913 AI306683 AA015B09 AA91B548 A1453570 AA772321 A1692775AA195733 AI474563 
AW873048 A12091 33 AI028182 AI374920 AW672807 AA406223 AA833684 T97255 H6913B AA3B2906 AW1 1 91 62 N31974 AI890684 N3941 3 
AA864877 AA679469 BE350651 N4102DAI050915 F00075 AA86487B N26970 AAB2BB9B AW019991 AW796631 AW993262 N48532 BE564662 
AV654063AI754461 AW94S712C03289 AV655314AV659070AV65980BAV660435 H70113C05323 R91984 H96949AV65B936AV658879 
H69137AA3B4411 AA412584 C02749 W32014 R58168 C05526BE536017 N24354AA2B79gi NB0109 F05452 R12740 H0B297 AL138354 
AW020B01 BE17B443 BE17e01B BE17B336 BE17B360 BE17B107 BE176385 BE17B215 BE1781B6 BE17B447 BE17B352 BE17B422 BE17B424 
BE178043 BE178093 BE178460 BE17B356 BE178441 BE178438 BE1 78467 AI091 259 BE177B39 BE178094 R28455 BE177B44 BE178100 
AA2623B7R70669 W80934 W93668AA256711BE17B141BE177893BE17B449AA167718 H69694BE178017BE17B02gBE177999BE177936 
AA095144 N32462 AA2B1203 AA2811B3 W47526 W05C15 R34165 R35396 T97366 R79640 W25258 R99450 AW368425 BE17B196 R26447 
C03146 C03683 

U25750AI792472AA487379A1B722B2AA487262 R22383A1865750 R21832AA59362BAW571B69AA377191 R7B814T27193 
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TABLE 2C 

Pkey; Unique number cofresponding to an Eos probeset 

Ref: Sequence source. Tlie 7 dlgt numbere in this column are Genbank Identifier (Gl) nu 

human chromosome 22." Dunham i, bI al. (1999) Jtetoe 402:489-495. 
Strand: Indicates DMA strand from which exons were predicted. 
NLposltlon: Indicates nucleoHde positions of predicted exons. 



s. "Dunham I. et al.' refers to the publlcafion entitled "The DMA sequence of 



Pkey Ref Strand 

400481 8439853 Plus 

400501 9796227 Minus 

400713 8113374 Minus 

400769 8131628 Plus 

400818 8569994 Plus 

400881 2842777 Minus 

400882 2842777 Minus 
400965 7770576 Minus 
400986 8085497 Minus 
400995 8099094 Plus 
401093 8516137 Minus 
40117B 9433616 Minus 
401192 9719502 Minus 
401209 7712287 Plus 
401405 7768126 Minus 
401416 7452889 Minus 
401419 7452889 Minus 
401444 8346725 Plus 
401512 7622346 Plus 
401 5S3 B247910 Plus 
401500 4388746 Minus 
401750 9828651 Plus 
401757 7239630 Plus 
401839 7656637 Plus 
401849 7770425 Plus 
401952 3319121 Minus 
401966 3126781 Plus 
402032 8117478 Minus 
402101 8117697 Plus 
402106 8131652 Plus 
402163 8568936 Plus 
402185 8576002 Plus 
402240 7690131 Plus 
402249 7704953 Minus 
402347 3099267 Minus 
402396 1905896 Plus 
402469 9797107 Minus 
402532 9800951 Minus 
402559 9864273 Plus 
402575 9884830 Minus 
402602 7239666 Plus 
402758 9213869 Plus 
402786 9715046 Plus 
402807 6456148 Minus 
402810 6010110 Plus 
402964 9581599 Minus 
403046 3540153 Minus 
403055 8748904 Minus 

403217 7630969 Plus 

403218 763096S Plus 
403291 7230870 Plus 
403328 8469086 Minus 
403654 8736093 Minus 
403704 4982546 Minus 
403708 5705981 Minus 
403725 7534031 Plus 

403739 7630882 Plus 

403740 7630882 Plus 

403745 7652036 Minus 

403746 7652036 Plus 
403885 7710403 Minus 
403893 7710581 Minus 
403947 7711923 Plus 
404039 8698763 Plus 
404054 3548785 Plus 
404058 3548785 Plus 
404108 8247074 Minus 
404211 5006246 Plus 
404277 1834458 Minus 
404384 8887026 Minus 
404407 7329316 Minus 
404489 8113772 Plus 
404527 8152087 Plus 



NLposition 

112433-112541 

12479-12619 



28671-29795 

172644-172765,173085-173200 

91446-91603,92123-92265 

110431-110708 

173043-173564 

63140-63319 

141186-141601 

22335-23166 

133663-133812 

69559-70101 



69276-69452,69548-69958 

121456-121626 

136389-136508 

90895-90994,93070-93213 

136399-136557 

27363-2751 3,28727-28891 ,29526-29731 

82143-82270,89284-89373,90696-90770,95822-96001,96688-96775,96870-96992,98046-98138 
88641-88751 

1016-1086,2751-2967,3241-3348,26677-26831 



1 34308-1 34487,1 35402-135587,136421-136548 

3717-3848 

166996-167119 

25486-25639 

104382-104527,106136-106372 

107636-107813,108694-108824,110435-110502,113182-113386 

13714-15440 

4426-4648 

71266-72351 

180240-180558 



109742-109883 

6785-6972,7478-7575 

87638-87924 



55707-55859,55369-56511 

109532-110225 

54089-54163,55427-55623 

58039-58149 

95177-95435 

120428-120703 

28634-28758 

8850-8996 

134394-134812 

86737-86843 

44563-44766,48209-48483,52255-52495 

86504-87227 
67610-68002 
93612-93887 



5435-7846 
38657-38817 
81889-82011 
66713-69175 
99397-101808 

185728-185885,194575-194686 
91665-91946 

38055-38156,42175-42391,43435-43553 

48154-48499 

98183-98480 

127737-127796,128080-128210,129888-130054,132545-132869 
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405387 6587915 
405396 6624129 
405429 7321905 



134573-134678 
109793-109969 

119867-120372,120481-120824,121029-121357 



61577-51723 

51704-51841,53581-53767 

99136-99313 

51198-51314 

19699-19828 

38944-39213 

71907-72080 
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Table 3A shows the Seq ID No, Pke/, ExAccn, UnigenelD, and UnlgeneTHe Ibr all of the sequences In Table 4. 

Pkey: Unique Eos probeset Identiflernumber 
ExAccn: Exemplar Access'ionnur'-- ■ 
UnigenelD: Urigene number 
Unigene Title: Unigene gene lille 

Seq ID No: Seq I D number correlation for those sequences in Table 4 



409051 
409123 
415787 



UnigenelD 
Hs,72472 
Hs.22785 



AW963372 
AA525775 
AA080912 
AA063403 



Unigene Title 
BMP-RIB 

gamma-aminobutyric acid (GABA) A recepto 
NM_001076*:Homo sapiens UDP glycosyltran 
PRO200C protein 

ESTs. Moderately similar to PC4259 fern 
gbzn04d03.r1 Stratagene hNT neuron (937 
gbzm04d12.s1 Stratagene corneal stroma 



415999 


AA172179 


HS594029 


ESTs 


416226 


AA577730 


Hs.188684 


ESTs, Weakly similar to PC4259fem'tin 


420757 


X78592 


Hs.gggi5 


androgen receptor (dihydrotestosterone r 


429163 


AA884766 




gb:aiii20a10.s1 SoareSuNFLT_GBC_S1 Hon 




AJ224172 


Hs.204096 


lipophiin B (uterogkibln family member) 


431099 


Y13367 


Hs.249235 


phosphoinosltide-3-klnase. class 2, alph 


432432 


AA541323 


Hs.1 15831 


ESTs 


432435 


BE2188e6 


H3.282070 


ESTs 


432527 


AW975028 


Hs,102754 


ESTs 


435876 


AW612586 


Hs.1 60271 


G protein^xiupled receptor 48 


438233 


W52448 


H3.56147 


ESTs 


439569 




Hs.222399 


CEGP1 protein 


440819 


Ai80g444 


Hs.202108 


ESTs 


442832 


AW206560 


Hs.253569 


ESTs 


447342 


AI199268 


Hs.19322 


Homo sapiens. Similar to FilKEN cDNA 2010 


447499 


AW262580 


Hb,147674 


piotocadherinbetaia 


451411 


AA017492 


Hs,135655 


EST 


451720 


AW970985 


HS.2908S3 





Seq ID No 
Seq ID No U 2 
Seq ID No 3-10 
Seq ID No 11 & 12 
Seq ID No 13 & 14 
Seq ID No 15 & 16 
Seq ID No 17 
Seq ID No 18 
Seq ID No 19-21 
Seq ID No 22 
Seq ID No 23 
Seq ID No 24 & 25 
Seq ID No 26 



Seq ID No 32 8c 33 
Seq ID No 34 
Seq ID No 35 8,36 
Seq ID No 37-40 
Seq ID No 41 8. 42 
Seq ID No 43 
Seq ID No 44 
SeqlDNo45&46 
SeqlDNo47&48 
Seq ID No 49 
SeqlDNo50&51 



180 



wo 02/098358 



PCT/LS02/17594 



Table 3B shows the accession numbers for those Pkey's lacking Unlgenell^'s for table 3A. For each probeset is listed gene cluster number from which oligonucleotides were 
designed. Gene clusters were compiled using sequences derived from Genbank ESTs and mRNAs. These sequences were clustered based on sequence similaii^ using 
Clustering and Alignment Tools (DoubleTwist, Oakland California). Genbank accession nunbets for sequences comprising each cluster are listed in the "Accession" column. 

Pkey CAT Number Accession 

408660 107294J AA525775AA056342AI538978 AW975281 AA664986 

409051 109699 1 AA080912AA075318AA0834fl3 AA076594 AA078992AA084926 AA081881 AA113913AA113892 AA083821 AAia-lBOl AA082953 AA070343 

AA062B35 AA075419 AA063293 AA071252 AA078900 AA062836 AW974305 
409123 110143J AA063403AA070823AA070050 
429163 300543J AA8B4766AW974271 AA592975AA447312 
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Table 3C shows genomic positioning for those Plie/s lacldng Unigene I Us and accesston numbers In table 3A. For each predicted exon is listed genomic sequence source 
used for prediction. Nucleotide locations of each predicted exon are also listed. 

Pkey Ref Strand Ntjosilion 
403740 7630882 Rus 86504-87227 
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11111 

C GGAGTCGGCG GGGCCTCGCG GGACGCGGGC AGTGCGGAGA CCGCGGCGCT 
GAGGACGCGG GAGCCGGGAG CGCACGCGCG GGGTGGAGTT CAGCCTACTC TTTCTTAGAT 
GTGAAAGGAA AGGAAGATCA TTTCATGCCT TGTTGATAAA GGTTCAGAC? TCTGCTGATT 
CATAACCATT TGGCTCTGAG CTATGACAAG AOAGGAAACA AAAAGTTAAA CTTACAAGCC 
TGCCATAAGT GAGAAGCAAA CTTCCTTGAT AACATGCTTT TGCGAAC 
AATGTGGGCA CCAAGAAAGA GGATGGTGAG AC-TACAGCCC CmCCCCCCG T 

GGTTGCCTGT 



CTGCCTCCAT TGAAAAACAG AGATTTTGTT GATGGACCTA T 

ATATCTGTGA CTGTCTGTAG TTTGCTCTTG GTCCTTATCA TATTATTTTG TTACTTCCGG 720 

TATAAAAGAC AAGAAACCAG ACCTCGATAC AGCATTGGGT TAGAACAGGA TGAAACTTAC 780 

ATTCCTCCTG GAGAATCCCT GAGAGACTTA ATTGAGCAGT CTCRGAGCTC AGGAAGTGGA 840 

TCAGGCCTCC CTCTGCTGGT CCftAAGGACT ATAGCTAAGC AGATTCAGAT GGTGAAACftG 900 

ATTG6AAAAG GTCGCTATGG GGAAGTTTGG ATGGGAAAGT GGCGTGGCGA AAAG6TAGCT 960 

GIGAAAGTGT TCITCACCAC AGAGGAAGCC AGCTGGTTCA GAGAGACAGA AATATATCAG 1020 

ACAGTGTTGA TGAGGCATGA AAACATTTTG GGTTTCaTTG CTGCAGATAT CRAAGGGACA 1080 

QGGTCCTG6A CCCAGTTGTA CCTAATCACA GACTATCATG AAAATGGTTC CCTTTATGAT 1140 

TATCTGAAGT CCaCCACCCT AGAOSCTAAA TCAATGCTGA AGTTAGCCTA CTCTTCTGTC 1200 

AGTGGCTTAT GTCATTTACA CACAGAAATC TTTAGTACTC AAGGCSAACC AGCAATTGCC 1260 

CATCGAGATC TGAAAAGIAA AAACATTCTG GTGAAGAAAA ATGGAACTTG CTGTATTGCT 1320 

GACCTGGGCC TGGCTGTTAA ATTTATTAGT GATACAAATG AAGTTGACAT ACCACCTAAC 1380 

ACTCGAGTTG GCACCAAACG CTATATGCCT CCAGAAGTGT TGGACGAGAG CTTGAACAGA 1440 

AATCACTTCC AGTCTTACAT CATG6CTGAC ATGTATAGTT TTGGCETCAT CCTTTGGGAG 1500 

GTTGCTAGGA GATGTGTATC AGGAGGTATA GTGGAAGAAT ACCAGCTTCC TTATCATGAC 1560 

CTAGTGCCCA GTGACCCCTC TTATGAGGAC ATGAGGGAGA TTGTGTGCAT CRAGAAGTTA 1620 

CGCCCCTCAT TCCCAAACCG GTGGAGCAGT GATGAGIGTC TAAGGCAGAT GGGAAAACTC 1680 

ATGACAGAAT GCTGGGCTCA CAATCCTGCA TCAAGGCTGA CAGCCCTGCG GGTTAAGAAA 1740 

ACACTTGCCA AAATGTCAGA GTCCCAGGAC ATTAAACTCT GATAGGAGAG GAAAAGTAAG 1800 

CATCTCTGCA GAAAGCCAAC AGGTACTCTT CTGTTTGTGG GCAGAGCAAA AGACATCAAA 1860 

TAAGCATCCA CAGTACAAGC CTTGAACATC GTCCTGCTTC CCAGTGGGTT CAGACCTCAC 1920 

CITTCAGGGA GCGACCTGGG CAAAGACAGA GAAGCTCCCA GAAGGAGAGA TTGATCCGTG 1980 
TCTGTITGTA GGCGGAGAAA CCGTTGGGTA ACTTGTTCAA GATATGATGC AT 

Seq ID irOt 2 Protein sequence 
Protein Accession tti NP_001194 

1 11 21 31 11 51 

I 



DSGLPWTSG CLGLEGSDFQ CRDTPIPHQE R 

GPIHHRALLI SVTVCSLLLV LIILFCYFRY KRQE7RPRYS IGLSQDETYI P 
EQSQSSOSGS GLPLLVQRTl AKQICJMVKQI GKGRYGEVWM GKNRGEKVAV KVFFTTEEAS 
WPRETEIYQT VLMEHENILG PIAADIKGTG SWTQLYLITD YHENGSLYDY LKSTTLDAKS 
' 3 GLCHLHTEIP STQGKPAIAH R 

CKRYMPP EVLDESLNRN HFQSYIMADM Y: 
EEYQLPYHDL VPSDPSYEDM REIVCIKKLR PSFPNRWSSD ECLRC»<GKLM TECWAHNPAS 480 
RLTALRVKKT LAKMSBSQDI KL 

Seq ID NO: 3 DNA sequence 

Nucleic Acid Accession #: 1IM_004961.2 

Coding sequence: 55.. 1575 

1 11 21 31 41 51 

I I I I I 1 

GCCAGAGCGT GAGCCGCGAC CTCCGCGCAG GTGGTCGCGC CGGTCTCCGC GGAAATGTTG 60 

TCCAAAGTTC TTCCAGTCCT CCTAGGCATC TTATTGATCC TCCAGTCGAG GGTCGAGGGA 120 

CCTCAGACTG AATCAAAGAA TGAAGCCTCT TCCCGTGATG TTGTCTATGG CCXCCAGCCC 180 

CAGCCTCTGG AAAATCAGCT CCTCTCTGAG 6AAACAASGT CAACIGA6AC T6AGACTGGG 240 

AGCAGAGTTG GCAAACTGCC AGAAGCCTCT CGCATCCTGA ACACTATCCT GAGTAATTAT 300 

GACCACRAAC TGCGCCCTGG CATTGGAGAG AAGCCCACTG TGGTCACTGI TGAGATCGCC 3 60 

GTCAACAGCC TTGGTCCTCT CTCTATCCTA GACATGGAAT ACACCATTGA CATCATCTTC 420 

TCCCAGACCT GGTACGACGA ACGCCTCTGT TACAACGACA CCTTTGAGTC TCTTGTTCTG 480 

AATGGCAATG TGGTGAGCCA GCTATGGATC CCGGACACCT TTTTTAGGAA TTCTAAGAGG 540 

ACCCACGAGC ATGAGATCAC CRTGCCCAAC CAGATGGTCC GCATCTACAA GGATGGCAAG 600 

GTGTTGTACA CAATTAGGAT GACCATTGAT GCCGGATGCT CACTCCACAT GOTCAGATTT 660 

CCAATGGATT CTCACTCTTG CCCTCTATCT TTCTCTAGCT TTTCCTATCC TGAGAATGAG 720 

ATGATCTACA AGTGGGAAAA TTTCAAGCTT GAAATCMTG AGAAGAACTC CTGGAAGCTC 780 

TTCCAGTTTG ATTTTACAGG AGTGAGCAAC AAAACTGAAA TAATCACAAC CCCAGTTGGT 840 

GACTTCATGG TCATGACGAT TTTCTTCAAT GTGAGCAGGC GGTTTGGCTA TGTTGCCTTT SOO 

CAAAACTATG TCCCTTCTTC CGTGACCAOS ATGCTCTCCT GGGTTTCCTT TTGGATCAAG 960 

ACAGA6TCTG CTCCAGCKCG GACCTCTCTA GGGATCaCCT CTGTTCTOAC CATGACCACG 1020 

TTGGGCACCT TTTCTCGTAA GAATTTCCCG CGTGTCTCCT ATATCACAGC CTTGGATTTC 1080 

TATATCGCCA TCTGCTTOST CTTCTGCTTC TGCGCTCTGT TGGAGTTTGC TGTGCTCAAC 1140 
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45 



CG CCAACATCAG 1260 

GAAGCTTTTG TGTGCCSGAT TGTCaCCACT GftGGGARGTG ATGGftGAGGA GCGCCGGTCT 1320 

TGCTCRGCCC AGCaGCCCCC TAGCCCAGGT AGCCCTGAG6 GTCCCCGCSG CCTCTGCTCC 1380 

AAGCrGGCCT GCTGTGAGTG GTGCRAGCGT TTTAAGAAGT ACTTCTGCaT GGTCCCEGAT 1440 

TGTGAGGGCA GTACCTGGCA GCAGGGCOSC CTCTGCRTCC ATGTCTACCG CCTGGATAAC 1500 




ACTTTCCCAG TGACTTCCCC TAGCCCTGAC CCAGGCACTA GGCCTTGGTG ACTTCCTGGG 2460 

GCCAAGAAAC TAAGGAAACT CGGCTTTGCA ACftGGCATTA CTCGCCATTG AITGGTGCCC 2520 

ACCCAGGGCA CACTGTCGGA GTTCTATCAC TTGCTTGACC CCTGGACCCA TAAACCAGTC 2580 

CACTGTTATA CCCGGGGCAC TCTAACCATC ACAATCAATC AATCftAATIC CCITAAATTT 2640 

GTATGGCACT GGAACTTTGG CAAAGCACTT TTGACAAGTT GTGTCTGATT GGAGCTTCAT 2700 

GATAGCCTTG TGACATCTTT AGGGCAGGAT TCTTATCCCC ATTTTGCAGA TGAAAACCCT 2760 

GAGTCACAGA TTTCTGTGGG ACTGTGGATC TCaCTGGAAG CTATCCAAGA GCXCACTGTC 2820 

ACCTTCTAGft CCftCATGATA GGGCTAGACA GCICAGTTCA CCMGATTCT CTTCTGTCAC 2880 

CTCTGCTGGC ACACCAGTGG CAAGGCCCAG AATGGCGACC TCTCTTTAGC TCAATTTCTG 2940 

GGCCTGAGGT GCTCAGACTG CCCCCAAGAT CAAATCTCTC CTGGCTGTAG TAACCCAGTG 3000 

GAATGAATTT GGACATGCCC CAATGCTTCT ATATGCTAAG TGAAATCTGT GTCTGTAATT 3 060 
T6TTGGGGGG TGGATAGGGT GGGGTCTCCA TCTACTTTTT GTCACCATCA T 
G6AAATATGT AAATAAATAT ATCAGCAAA6 CAAAAAGAAA AAAAAAAA 

Seq ID NOi 4 Prot 



40 1 I 1 1 I 1 

MLSKVLPVIiL GILLILQSRV EGPQTESKNE ASSRDWYGP QPQPLBNQKL SEETKSTETE 

TGSRVGKLPE ASRILKTILS NVDHKLRPGI GEKPTWTVE lAVNSLGPLS ILDMEVTIDI 

IFSQTWYDBR LCiTNDTFESL VLKGNWSQL WIPDTFFMrS KRTHEHEITM PHQMVRIYKD 

GKVLYTIRMT IDAGCSLHML RFPMDSHSCP LSFSSFSYPE NEMIYKWENF KLEINEKNSW 

KLFQPDFTGV SNKTEIITTP VGDFMVMTIF FNVSRRFGYV AFQNYVPSSV TTMLSWVSFW 

IKTESAPART SLGITSVLTM TTLGTFSRKN PPRVSYITAL DFYIAICFVF CFCALLEFAV 

LNFLIYNQTK AHASPKLRHP RINSRAHART EARSRAC3UIQ HQEAFVCQIV TTEGSDGEER 

PSCSAQQPPS PGSEEGPRSL CSKLACCEWC KRFKKYPCMV PDCEGSTWQQ GRLCIHVYRL 

DNYSRWFPV TPFFFIIVLYW LVCLNL 



GCCAGAGCGT GAGCCGCGAC CTCCGCGCAG GTGGTCGCGC CGGTCTCCGC O 
CAGAGAAGTG CTCAAATCAT AAGTGTACAG CTGATGAGTT GTCAAAAAAT GACCACAGCG 



CACTGCCTCC CAGCAAAGGC AGCACTATCC GGACTTCTAA CACCRTCGGG TCGAGGGACC 3 00 

TCAGACTGAA TCAAAGAATG AAGCCTCTTC CCGTGATGTT GTCTATGGCC CCCAGCCCCA 3 60 

GCCTCTGGAA AATCAGCTCC TCTCTGAGGA AACAAAGTCA ACTGAGACTG AGACTGGGAG 42 0 

CAGAGTTGGC AAACTGCCAG AAGCCTCTCG CATCCTGAAC ACTATCCTGA GTAATTATGA 48 0 

CCACAAACTG CGCCCTGGCA TTGGAGAGAA GCCCACTGTG GTCACTGTTG AGATCTCCGT S40 

CAACAGCCTT GGTCCTCTCT CIATCCTAGA CATGGAATAC ACCATTGACA TCATCTTCTC 600 

CCAGACCTGG TACGACGAAC GCCTCTGTTA CAACGAtaCC TTTGAGTCTC TTGITCTGAA 660 

TGGCAATGTG GTGAGCCAGC TATGGATCCC GGACACCTTT TTTAGGAATT CTAAGAGGAC 720 

CCACGAGCAT GAGATCACCA TGCCCaACCa GATGGTCXGC ATCTACAAGG ATGGCASGGT 780 

GTTGTACACA ATTAGGATGa CCATTGATGC CGGATGCICA CTCCACATGC TCAQATTTCC 840 

AATGGATTCT CACTCTTGCC CTCTATCTTT CTCTAGCTTT TCCTATCCTG AGAATGAGAT 900 

GATCTACAAQ TGGGAAAATT TCaAGCTTGA AATCAATGAG AAGAACTCCI GGAAGCTCTT 960 

CCAGTTGGAT TTTACAGGAG TGSIGCAACAA AACTGAAATA ATCACAACCC CAGTTGGTGA 1020 

CTTCATGGTC ATGACGATTT TCTTCAATGT GAGCAGGCGG TTTGGCTATG TTGCCTTTCA 1080 

AARCTATGTC CCTTCTTCCG TGACCaCGAT GCTCTCCIGG GTTTCCTTTT GGATCAAGAC 1140 

AGAGTCTGCT CCAGCCCGGA CKTCTCTAGG GATCACCTCT GTTCTQACCA TGACCACGTT 1200 

GGGCACCTTT TCTCGTAAGA ATTTOCCGCG TGTCTCCTAT ATCACAGCCT TGGATTTCTA 1260 

TATCGCCATC TGCITCGTCT TCTGCITCTG CGCTCTGTTG GAGTTTGCTG TGCTCAACTT 1320 

CCTGATCTAC AACCAGACAA AAGOXATGC TTCTOCTAAA CTCCGCCATC CTCGTATCAA 1380 

TAGCCGTGCC CATGCCOSTA CCCGTGCACG TTCCCBAGCC TGTGCCCGCC AACATCAGGA 1440 

AGCTTTTGTG TGOCAGATTG TCACCACTGA GGQRAGTGAT GGAGRGGAGC GCCCGTCTTG 1500 

CTCAGCCCAG CAGCCCCCTA GCCCRGGTAG CCCTGAGGGT CCCCGCAGCC TCTGCTCCAA ISSO 

GCTGGCCTGC TGTGAGTGGT GCAAGCGTTT TAAGAAGTAC TTCTGCATGG TCCCCGATTG 1620 



184 



wo 02/098358 



PCT/LS02/17594 



TGAGGGCAGT ACCTGGCAGC AGGCCCGCCT CTGCATCCAT GTCTACCGCC TGGATAACTA 1680 



45 



CCTTAACTTG TAGGTACCAG CTGGTACCCT GTGGGGCAAC CTCTCCAGT 

TTTTTCCTGC CCCATTCCCC AAACS.GAAGC TTGCAGAGOG TTTGTCTTTG CTGCCCCTCT 

CCCCTACCTG GCCCATTCAC TGAGTTTTCT CAGCAGACCA T7TCAAATTA TTAATAAATG 
GGCCACCTCC CTCTTCTTCA A 



CTATACCTGG 2220 



AGATTATTAT GTTCTCAGTT CTCTCTCCCT GCTACCCCTT T 

yiATTJG GGACAGCATT 2400 

G ACATCTCCCT CTCCTTGCTG GCTCCATCTT TCGTCTGC&C 2460 

TACCAATTCA ATGCCCTTCA TCCAATGGGT ATCTATTTTT GTGTGTGATT ATAGTAACTA 2520 

CTCCCTGCTT TATATGCCAC CCTCITCCTT CTCTTTGACC CCTGTGACTC TTTCTGTAAC 2580 

TTTCCCAGTG ACTTCCCCTA GCCCTGACCC AGGCACTAGG CCTTGGTGAC TTCCTGGGGC 2640 

CAAGAAACTA AGGAAACTCG GCTTTGCAAC AGGCATTACT CGCCRTTGAT TGGTGCCCAC 2700 

CCAGGGCACa CTGTCGGAGT TCTATCACTT GCTTGACCCC TGGACCCATA AACCAGTCCA 2760 

CTGTTATACC CGGGGCACTC TAACCATCRC AATCAATCAA TCAAATTCCC TTAAATTTGT 2820 

ATGGCaCTGG AACTTTGGCA AAGCACTTTT GACAAGTTGT GTCTGATTGG AGCTTCRTGA 2380 

TAGCCTT6TG ACATCTTTAG GGCAGGATTC TTATCCCCAT TTTGCAGATG AAAACCCTGA 2940 

GTCACKGATT TCIGTGGGAC TGTQGATCIC ACTGGARGCT ATCCRAGAGC CCACTGTCAC 3000 

CTTCTAGACC ACRTGATAGG GCTAGACAGC TCafiTTCACC ATGATTCTCT TCTGTCaCCT 3060 

CTGCTGGCAC ACCAGTGGCA AGGCCCAGAA TGGCGACCTC TCTTTAGCTC AATTTCTGGG 3120 

CCTGAGGTGC TCAGACTGCC CCCftAGATCA AATCTCICCT GGCTGTAGTA ACCCAGTGGA 3180 

ATGAATTTGG ACATGCCCCA ATGCITCTAT AIGCTAAGTG AAATCTGTGT CIGTAATTTG 3240 

TTGGGGGGTG GATAGGGTGG GGTCTCCATC TACTTTTTGT CACCSVTCATC TGAAATGGGG 3300 
AAATATGTAA ATAAATATAT CAGCAAAGC 

Seq ID NO: 6 Protein sequence 
Protein Accession tt: NP_0S8B19.1 

1 11 21 31 41 51 

I I I I I I 

MEYTIDIIPS QTWVDERLCY KDTFESLVLN GNWSQLWIP DTFFRNSKRT HBHBITMPNQ 60 

MVRIYKDGKV LYTIRMTIDA GCSLHMLRPP I1DSHSCPLSF SSFSYPEWEM lYKWENFKLE 120 

SRRPGYVAFQ NYVPSSVTTM 180 

VSYITAUDFY lAICFVFCFC 240 

ALLEPAVLKF LIYNQTKAHA SPKLRHPRIN SHAHARTRAH SEACARQHQE AFVCQIVTTE 300 

GSDGEBRPSC SAQQPPSPGS PBGPRSLCSK LACCEWCKEF KKYFCMVPDC BGSTWQQARL 360 
CIHVYRLDNY SRWFPVTFF FFNVLYWLVC LNL 

Seq ID NO: 7 DMA sequence 

Nucleic Acid Accession NM_021987.1 

Coding sequence i 572 . . 1657 

1 11 21 31 41 ■ 51 

I I I I I I 

GCCAGAGCGT GAGCCGCGAC CTCCGCGCAG GTGGTCGCGC CGGTCTCCGC GGAAATGTTG 60 

TCCAAAGTTC TTCCAGTCCT CCTAGGCATC TTATTGATCC TCCAGTCGAG AACATGTATA 120 

CAGAGAAGTG CTCAAATCAT AAGTGTACAG CTGATGAGTT GTCAAAAAAT GACCACAGCG 180 

GTGTAAAGAA AGCCAAATCA AGGACCCGAA TGTGAGCAGG ACCTCAGAAG CCCCCTTTGT 240 

CACTGCCTCC CAGCAAAGGC AGCACTATCC GGACTTCTAA CACCATCGGG TCGAGGGACC 300 

TCAGACTGAA TCAAAGAATG AAGCCTCTTC CCGTGATGTT GTCTATGGCC CCCAGCCCCA 360 

GCCTCIGGAA AATCAGCTCC TCTCIGAGGA AACAAAGTCA ACTGAGACTG AGACTGGGAG 420 

CAGAGTTGGC AAACTGCCAG AAGCCTCTQG CATCCTGAAC ACTATCCTGA GTAATTATGA 480 

CCACAAACTG CGCCCTGGCA TTGGAQAGRA GCCCACTGTG GTCACTGTTG AGATCTCCGT 540 

CAACAGCCTT GGTCCTCTCT CTATCCTAGA CATGQAATAC ACCATTGACA TCaTCTTCTC 600 

CCAGACCTGG AATTCTAAGA GGAOCCAOSA GCATGAGATC ACCATGCCCA ACCA6ATGGT 660 

CCGCATCTAC AAGGATGGCA AGGTGTTGTA CACAATTAGG ATGACCATTG ATGCCGGATG 720 

CTCACTCCAC ATGCTCAGAT TTCCS^ATGGA TTCTCACTCT TGCCCTCTAT CTTTCTCTAG 780 

CTTTTCCTAT CCTGAGAATG AGATGATCTA CftAGTGGGAA AATTTCAAGC TTGAAATCAA 840 

TGAGRAGAAC TCCTGGARGC TCTTrCAGTT TGATTTTACA GGAGTGAGCA ACaAAACTGA 900 

AATAATCACA ACCCCAGTTG GTGACTTCaT GGTCATGACG ATTTTCTTCA ATGTGAGCA6 960 

GCGGTTTGGC TATGTTGCCT TTCAAAACTA TGTCCCTTCT TCCGTGACCA CGATGCTCTC 1020 

CTGGGTTTCC TTTTGGATCA AGACAGAGTC TGCTCCaGCC CGGACCTCTC TAGGGATCAC 1080 

CTCTGTTCTG ACCATGACCA CGTTGGGCAC CTTTTCTCGT AAORATTTCC CGCGTGTCTC 1140 

CTATATCACA GCCTTGGATT TCTATATCGC CATCTGCTTC GTCTTCTGCT TCTGCGCTCT 1200 

I GCIGTGCTCA ACTTCCTGAT CTACAACCAG ACAAAAGCCC ATGCTTCTCC 1260 

C CATCCTCGTA TCAATAGCCG TGCCCATGCC CGTACCCGTG CACGTTCCCG 1320 

AGCCTGTGCC CGCCAACATC AGGAAGCTTT TGTGTGCCAG ATTGTCACCA CTGAGGGAAG 1380 

3 GAGCGCCCGT CTTGCTCAGC CCAGCAGCCC CCTAGCCCAG GTAGCCCTGA 1440 

Z AGCCTCTGCT CCAAGCTGGC CTGCTGT6AG TGGTGCAAGC GTTTTAAGAA 1500 

C ATGGTCCCCG ATTGTGAGGG CAGTACCTGG CAGCAGGGCC GCCTCTGCAT 1560 

CCATGTCTAC CGCCTGGATA ACTACTCGAG AGTTGTTTTC CCAGTGACTT TCTTCTTCTT 1620 

CAATGTGCTC TACTGGCTTG TTTGCCTTAA CTTGTAGGTA CCAGCTGGTA CCCTGTGGGG 1680 

CAACCTCTCC AGTTCCCCAG GAGGTCCAAG CCCCTTGCCA AGGGAGTTGG GGGAAAGCAG 1740 

CAGCAGCAGC AGGAGCGACT AGAGTTTTTC CTGCCCCATT CCCCAAACAG AAGCTTGCAG 1800 

AGGGTTTGTC TTIGCTGCCC CTCTCCCCIA CCTGGCCCAT TCACTGAGIT TTCTCAGCAG 1860 

h ATIATTAATA AATGGGCCAC CTCCOTCTTC TTCAAG6AGC ATCCGTGATG 1920 



185 



wo 02/098358 



PCT/LS02/17594 



60 



T GCCTTTGCAG TGCTITCGGC CCAGTTCTGG CCTCAGCCTC A 
GACTAGTTGC TTGCCTATAC CTGGCACCTC ATTAAGATGC TGGGCAGCAG TATAACAGGA 
GGAAGAGATC CCTCTCCTTT GGTCAGATTA TTATGTTCTC AGTTCTCTCT CCCTGCTACC 
CCTTTCTCTG CAGATAGATA GACACTGGCA TTATCCCTTT AGGAAGAGGG GGGGGCAGCA 



20 
25 



T GACTTCCTGG GGCCAAGAAA CTAAGGAAAC TCGGCTTTGC AACAGGCATT 
ACTCGCCATT GATTGGTGCC C 
CCCTGGACCC ATAAACCAGT C 
CAATCAAATT CCCTTAAATT TGTATGGCAC TGGAACTTTG GCAAAGCACT TTTGACAAGT 2760 
TGTGTCTGAT TGGAGCTTCA Ti 



15 GCTATCCAAG AGCCCACTGT CACCTTCTAG ACCACATGAT AGGGCTAGAC AGCTCAGTTC 
ACCATGftTTC TCTTCTGTCA CCTCIGCTGG CACACCAGTG GCAAGGCCCA GAATGGCGAC 
CTCTCTTTAG CTCAATTTCT GGGCCTGAGG TGCTCAGRCT GCCCCCAAGft TCAfiATCTCT 
CCTGGCTGTA GTAACCCftGT GGAATGAATT TGGACATGCC CCAATGCTTC TATATGCIAA 
GTGAAATCTG T6TCTGTAAT TTGTTGGGGG GTGGATSGGG TGGGGICTCC ATCTACTTTT 
TGTCACCATC ATCI6AAAT6 GG6AAATAT6 TAAATAAATA lATCAGCAAA GC 



I I I I I I 

MEYTIDIIFS QTWNSKRIHE HEITMPNQMV RIYKDGKVLY IIBMTIDAGC SLHMLRFEHD 
SHSCFLSFSS FSYFENBMIY KWEHFKLEIN EKHSWKLFQF DFTGVSNKTE IITTFVGDFM 
VMTIFFNVSR RFGYVAFQNY VPSSVTTMLS WVSFWIKTBS AEARTSLGIT SVLTMTTLGT 
30 FSRKNPPRVS YITALDFVIA ICFVFCFCAL LEFAVLNFLI YNQTKAHASP KLHHPRINSR 
AHARTEIAESR ACARQHQEAF VCQIVTTEGS DGBBRPSCS& QQPPSPGSPE GPRSLCSKLA 
3 STWQQGELCI HVYRLDNYSR WFPVTFFFF HVLYWIiVCLlir 



I I I I I 

GCCAGAGCGT GAGCCGCGAC CTCCGCGCAQ GTGGTCGCGC CGGTCTCCGC 
TCCAAAGTTC TTCCAGTCCT CCTAGGCATC TTATTGATCC TCCAGTCGAG AACftTGTATA 
CAGAGRAGTG CTCAAATCAT AAGTGTACAG CTGATGAGTT GTCAAAAAAT GACCACAGCG 
GTGTAAAGAA AGCCAAATCA AGGACCCGAA TGTGAGCAGG ACCTCAGAAG CCCCCTTTGT 
CACTGCCTCC CAGCAAflGGC AGCACTATCC GGACTTCTAA CACCATCGGT GAGTTTCATA 
CCTTGGCAGA TGGCCTTTAA CATTTTTGTT TAATTCAATT ATTCTTACTA ATCTTCTTCT 
TTTTCTTGGC TGTGGTGCAT GGCTGTGGAG CTCAGGG 

rCTGTGG GTGGAGGACI CCTGCCTTTC C 




CACAACCAAC AAAACCGCAA AATATTCCCA C 
CICTGGCTTT TCCICTCAGC CCTGGCCCTC T 
ICAGGCTGAC TAGAGGCCAA GGCGACCAAC A 
AAATGCCCTC TTCRTTTCAC GTGTAAi 

TTTTTTCTTA AATftftAAGAG TGATCATAAA AGAGGGACAG CATAGAAAGT CCCCAAAGAG 
CftGCAftGGTT TTAAAGAAAT TCACAAGCCT AATCTGTCAC TGTCTTATAA TTTGCTATTA 
CCAGTCACAA TTTAACTAGG TTTTGTGTTG AAAACTTGTT TTGGTTTGCT TCTGTCCCAA 
GAGGCACTAG CTGGGGCCCC TACAGAGTGC AGGGCAGAGC TTCATTTTTC GTTTGAATGT 
TCTAGGGTCG AGGGACCTCA GACTGAATCA AAGAATGAAG CCTCTTCCCG TGATGTTGTC 
TATGGCCCCC AGCCCCAGCC TCTGGAAAAT CAGCTCCTCT CTGAGGAAAC AAAGTCAACT 
GAGACTGAGA CTGGGAGCAG AGTTGGCAAA CTGCCAGAAG CCTCTCGCAT CCTGAACACT 
AICCTGAGTA ATTATGACCA CAAACTGCGC CCTGGCATTG GAGAGAAGCC CACTGTGGTC 
ACTGTTCAGA TCTCCGTCAA CAGCOITGGT CCTCTCTCTA TCCTAGACAT GGAATACACC 
ATTGACATCA TCTTCTCCCA GACCTGGTAC GACGAAOSCC TCTGTTACAA CGACACCTTT 
GAGTCTCTTG TTCTGAATGG CAATGTGGTG AGCCAGCTAT GGATCCCGGA CACCTTTTTT 
65 AGGAATTCTA AGAGGACCCA CGAGCATGAG ATCACCATGC CCAACCAGAT GGTCCGCATC 
TACAAGGATG GCAAGGTGTT GTACACAATT AGGAT6ACCA TTGATGCCGG ATGCTCACTC 
CACATGCTCA GATTTCCAAT GGATTCTCAC TCTTGCCCTC TATCTTTCTC TAGCTTTTCC 
TATCCTGAGA ATGAGATGAT CTACAAGTGG GAAAATTTCA AGCTTGAAAT CAATGAGAAG 
AACTCCTGGA AGCICTTCCA GTTTGATTTT ACAGGAGTGA GCAACAAAAC TGAAATAATC 
70 ACAACCCCAG TTGGTGACTT CATGGTCATG ACGATTTTCT 
GGCTATGTTG CCTTTCAAAA CTATGTCCCT TCTTCCGTGA 

A TCAAGACAGA GTCTGCTCCA GCCCGGACCT CTCTAGGGAT 
:GTTGGG CACCTTTTCr CGTAAGAATT 
ACAGCCTTGG ATTTCTATAT CGCCATCTGC TTCGTCTTCT 
TTTGCTGTGC TCAACTTCCT GATCTACAAC CAGACAAAAG CCCATGCTTC 
CGCCATCCTC GTATCAATAG CCGTGCCCAT GCCCGTACCC GTGCACGTTC CCGAGCCTGT 
GCCCGCCAAC ATCAGGAAGC TTTTGTGTGC CAGATTGTCA CCACTGAGGG AAGTGATGGA 2220 
GAGGAGCGCC CGTCTTGCTC AGCCCAGCAG CCCCCTAGCC CAGGTAGCCC TGAGGGTCCC 2280 
CGCAGCCTCT GCICCAAGCT GGCCTGCTGT GAGTGGTGCA AGCGTTTTAA GAAGTACTTC 2340 
TGCATGGTCC CCGATTGTGA GGGCftGTACC TGGCAGCAG6 GCCGCCTCTG CRTCCATGTC 2400 
TACCGCCTGG ATAACTACTC GAGAGTTGTT TTCCCAGTGA CTTTCTTCTT CTTCAATGTG 2460 
CTCTACTGGC TTGTTTGCCT TAACTTGTAG GTACCAGCTG GTACCCTGTG GGGCAACCTC 2520 
TCCAGTTCCC CAGGAGGTCC AAGCCCCTTG CCAAGGGAGT TGGGGGAAAG CAGCAGCAGC 2580 
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GTCTTTGCTG CCCCTCTCCC CTACCTGGCC CATTCACTGA GTTTTC7CAG C 

CAAATTATTA ATAAATGGGC CACCTCCCTC TTCTTCAAGG AGCATCCGTG ATGCTCAGTG 

TTCAAAACCA CAGCCACTTA GTGATCAGCT CCCTAAAACC ATGCCTAAGT ACAGGCGGAT 

TAGCTATCTT CCAACAATGC TGACCACCAG ACAATTACTG CA7TTTTCCA GAAGCCCACT 

ATTGCCTTTG CAGTGCTTTC GGCCCAGTTC TGGCCTCAGC CTCAAAGTGC ACCGACTAGT 

TGCTTGCCTA TACCTGGCAC CTCATTAAGA TGCTGGGCAG CAGTATAACA GGAGGAAGAG 

MTCAGA TTATTATGTT CTCAGTTCTC TCTCCCTGCT ACCCCTTTCT 

G ATAGACACTG GCATTATCCC TTTAGGAAGA GGGGGGGGCA GCAAGAGAGC 

CTATTTGGGA CAGCATTCCT CTCTCTCTGC TGCTGTGACA TCTCCCTCTC C 



30 



: CCTGCTTTAT ATGCCACCCT CTTCCTTCTC TTTGACCCCT 
r CCCAGTGACT TCCCCTAGCC CTGACCAGGC ACTAGGCCTT 
Z TGGGGCCAAG AAACTAAGGA ARCTCGGCTT TGCAACAGGC ATTACTCGCC 
G GGCACACTGT CGGAGTTCTA TCACTTGCTT G 
CCCATAAACC AGTCCACTGT TATACCCGGG GCACTCIAAC CATCACAATC Ai 
ATTCCCTTAA ATTTGTATGG CACTGGAACT TTGGCRAAGC ACTTTTGACA AGTTGTGTCT 3 600 
GATTGGAGCT TCRTGATAGC CTTGTGACaT CTTTAGGGCA GGATTCTTAT CCCCATTTTG 3 660 
CAGATGRftAA CCCTGAGTCA CAGATTTCIG TGGGACTGTG GATCTCACTG GAAGCTATCC 3720 
AAGAGCCCAC TGICACCTTC TAGACCACaT GATAGGGCTA GACAGCTCSG TTCACC3\TGA 3780 
TTCTCTTCTG TCACCTCTGC TGGCACACCA GTGGCRAGGC CCftGAATGGC GACCTCTCTT 3340 
TAGCTCAATT TCTGG6CCTG AGGTGCTCaG ACTGCCCCCA AGATCAAATC TCTCCTGGCT 3900 
GTAGTAACCC AGTGGAATGA ATTTGGACaT GCCCa«.TGC TTCTATATGC TARGTGftAAT 3960 
CIGTGTCTGT AATTTGTTGG GGGGTG6ATA GGGTGGGGTC TCCATCTACT T 
ATCATCTGAA ATQGGGAAAT ATGTAAATAA ATATATCAGC AAAGC 



MBYTIDIIFS QTWYEBRLCY NDTFESLVLN GNWSQLWIP DTFFRNSKRT HEHEITMPNQ 
MVRIYKDGKV LYTIRMTIDA GCSLHMLRFP MDSHSCFLSF SSFSYPENEM lYKWENFKLB 
INBKNSWKLF QFDFTGVSNK TBIITTPVGD PMVMTIFPNV SRRFGYVAFQ HYVPSSVTTM 



35 LSWVSFWIKT BSAEAETSLG ITSVIiTMTTL GTFSRKNFPR VSYITALDFY 

ALLBFAVLNF LIVNQTKAHA SPKLRHPRIN SRAHARTRAR SRACARQHQE AFVCQIVTTB 
GSDGBERPSC SAQQPPSPGS PEGPRSLCSK LACCEWCKEF KKYPCMVPDC EGSTWQQGRL 
F FFMVLYWLVC LNL 



Seq ID NO: l: 
Nucleic Acid 

Coding sequence: 22.. 1614 

1 11 21 31 

I I I I 

TTCGGCACGA GTAAGACCAG GATGTCTCTG AAATGGACGT CAGTCTTTCT G 
CTCAGTTGTT ACTTTAGCTC TGGAAGCTGT G 

lATCCTG GAAGAGCTTG TTCAGAGGGG T 
C TTCTACTCTT G 

TTAGAAGTTT ATCCTACATC TTTAACTAAA AATGAITTGG AAGATTCTCT TCTGAAAATT 300 

CTCGATAGAT GGATATATGG TGTTTCAAAA AATACATTTT GGTCATATTT TTCACAATTA 3 60 

CAAGAATTGT GTTGGGAATA TTATGACTAC AGTAAOIAGC TCTGTAAAGA TGCAGTTTTG 42 0 

AATARGAAAC TTATGATGAA ACTACAAGAG TCAAAGTTTG ATGTCATTCT GGCAGATGCC 180 

CTTAATCCCT GTGGTGAGCT ACTGGCTGAA CTATTTAACA TAOCCTTTCT GTACAGTCTT 540 

CGATTCTCTG TTGGCTACAC ATTIGAGAAG AATGSTGGAG GATTTCTGTT CCCTCCTTCC 600 

TATGTACCTG TTGTTATGTC AGAATTARGT GATCAAATGA TTTTCATGGA GAGGATAAAA 660 

AATATGATAC ATATGCTTTA TTTTGACITT TGGTTTCAAA TTTATGATCT GAAGAAGTGG 720 

GACCAGTTTT ATAGTGAAGT TCTAGGAAGA CKCACTACAT TATTTGAGAC AATGGGGAAA 780 

GCTGAAATGT GGCTCATTCG AACCTATTGG GATTTTGAAT TTCCTCGCCC ATTCTTACCa 840 

AATGTTGATT TTGTTGGAGG ACTTCACTGT AAACCAGCCA AACCCCTGCC TAAGGRAATG 900 

GAAGAGTTTG TGCAGAGCTC TGGAGAAAAT GGTATTGTGG TGTTTTCICT GGGGTCGATG 960 

ATCAGTAACA TSICAGAAGA AAGTGCCaAC ATGATTGCAT CAGCCCTTGC CCAGATCTCa 1020 

CAAAAGGTTC TATGGAGATT TGATGGCAAG AAGCCRAATA CATTAGGTTC CAATACTOGA 1080 

CTGTACAAGT GGITACCCCA GAATGACETT CTTGGTCATC CCAAAACCAA AGCTTTTATA 1140 

ACTCATGGTG GAACCAATGG CATCTATGAG GCGATCTACC ATGGGATCCC TATGGIGGGC 1200 

ATTCCCTTGT TTGCGGATCA ACATGATAAC ATTGCTCACA TGAAAGCCAA GGGAGCAGCC 1260 

CTCAETGTGG ACATCAGGAC CATGTCAAGT AGAGATTTGC TCARTGCATT GAAGTCAGTC 1320 

ATTAATGACC CTGTCTATAA AGAGAATGTC ATGAAATTAT CAAGAATTCA TCATGACCftA 1380 

CCAATGAAGC COCTGGATCG AGCAGTCTTC TGGATTGAGT 1TGTCATGCG CCACAAAGGA 1440 
GCCAAGCACC TTCGAGTCGC AGCTCACAAC CTCACCTGGA T 
I TCCTGCTGGC CTGCGTGGCA ACTGTGATAT T 
r TCCGAAAGCT TGCCAAAACA GGAAAGAAGA A 

3AAAGAT GGGACTCCTC CTTTATTTCA GCATGGAGGG 1680 

TTTTAAATGG AGGATTTCCT TTTTCCTGTG ACAAAACATC TTTTCACAAC TTACCTTGTT 1740 

AAGACAAAAT TTATTTTCCA GGGATTTAAT ACGTACTTTA GTIGGAATTA TTCTATGTCA 1800 

ATGATTTTTA AGCTATGAAA AATACAATGG GGGGAAGGAT AGCATTTQGA GATATACCTA 1860 

ATGTTAAATG ACGAGTTACT GGATGCAGCA CGCAACATGG CACATGTGTA TACATATGTA 1920 

GCTAACCCTT CGTTGTGCAC ATGTACCCTA AAACTTAAAG TATAATTTAA AAAAAGCaAA 1980 

AAAAAAAAAT ACCAACTCTT TTTTTTAAAC CAGGAAGGAA AAIGTGAACA TGGAAACAAC 2040 
TTCTAGTATT GGATCTGAAA ATAAAGTGTC A 

Seq ID HO: 12 Protein sequence 
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MSLKWTSVFL LIQLSCYFSS GSCGKVIiVWP TEYSHWIMMK TIIiEKLVQHG HEVTVLTSSft 
STLVNASKSS AIKLEVXPTS LTKHDLEDSL LKILDRWIYG VSKNTFWSXF SQLQELCWEy 
XDYSNKLCKD AVLNKKLMMK LQBSKFDVIL ADftLNPCGBL LftBLFNIPFL YSLRPSVGYT 
FEKHSGGFLF PPSYVFWHS ELSDQMIFME RIKHMIHMLY FDFHFQIXDL KKMDQFYSEV 
LGEPTTLFET MGKAEMWLIR TYWDFEPPRP PLPHVDFVGG LHCKPAKPLP KEMEEPVQSS' 
QENGIWFSL GSMISNMSEE SAHMIASALA QIPQKVIiHRF DGKKPHTLGS NIRLYKNLPQ 
NDLLGHPKTK APITHGGTNG lYERIYHGIP MVGIPLFSDQ HDNMHMKRK GRM.SVDIRT 
MSSEDLLNAL KSVItnOPVYK ENVMKLSRIH HDQPMKPLDR AVPWIEPVMR HKGAKHLRVA 
AHNLTWIQYH SLDVIAFLLA CVATVIFIIT KFCLFCFRKL AKTGKKKKED 

Seq ID NO: 13 DNA sequence 

Nucleic Acid Accession #< NM_014109.1 

Coding sequences SSI.. 1739 



TGCTTTGGAA AAGTTTACTG TATATACATT AGACATTCCT GTTCTTTTTG 

TATAGTGTAT GTTCCTCATA TCCAC3TGTG GTG3GAAATA GTTGGACCGA 
CACTTAAAGC CACATTTACC ACATTATTAC AGAATATTCC TTCATTTGCT CCAGTTTTAC 
TACTTGCAAC TTCTGACAAA CCCCATTCCG CTTTGCCAGA AGAGGTGCAA GAATTGTTTA 
TCCGTGATTA TGGAGAGATT TTTAATGTCC AGTTACCGGA TAAAGAAGAA CGGACAAAAT 



TTTTGCAGGC TTTGGAGGTA CTCCCAGTAG C 
WGACTA GAAGAACAAG A 
k TGTTACRCAT S 
CTGTTGACCC TGATGAGGTT C 
CATCTGTAAT CAGTAAAATT GATCTACACA AGTATCTGAC TGTGAAAGAC TATTTGAGAG 



GTCTTATTAG GCATAGAGCC TGTGCTTTAA GAGATACTGC CTATGCCATA ATTAAAGAAG 840 

AACTT6ATGA AGACTTIGAG CAGCTCTGIG AAGAAATTCA GGAATCTA6A AA6AAAAGA6 900 

GTTGTAGCTC CTCCAARTAT GCCCCGTCTT ACTACCATGT GATGCCAAAG CRAAATTCCA 960 

CTCTTCTTGG TGATAAAAGA TCAGACCCSU; AGCA6AATGA AAAGCTAAAG ACACCGAGTA 1020 

CTCCTGTGGC TTGCAGCACT CCTGCTCAGT TGAAGAGGAA AATTCGCAftA AAGTCAAACT 1080 

GGTAOTTAGG CACCATAAAA AAGCffiAAGGA AGATTTCACA GGCAAAGGAT GATAGCCAGA 1140 

ATGCCATA6A TCACAAAATT GAGAGTGATA CAGAGGAAAC TCAAGACACA AGTGTAGATC 1200 

ATAATGAGAC CGGAAACACA GGAGAGTCTT CGGTGGAA6A AAATGAAAAA CAGCAAAAT6 1260 

CCTCTGAAAG CAAACTGGAA TTGAGAAATA ATTCAAATAC TTGTAATATA GAGAATGAGC 1320 

TTGAAGACTC TAQGAAGACT ACAGCATGTA CAGAATTGAG AGACAAGATT GCTTGTAATG 1380 

GAGATGCTTC TAGCTCTCAG ATAATACAIA TITCTGATGA AAATGAAGGA AAAGAAATGT 1440 

GTGTTCTGCG AATGACTCGA GCTAGACGTT CCCAGGIAGA ACAGCAGCAG CTCATCACTG 1500 

TTGAAAAGGC TTTGGCAATT CTTTCTCAGC CTACACX:CTC ACTTGTTGTG GATCATGAGC 1560 

GATTAAAAAA TCTTTTGAAG ACTGTTGTTA AAAAAAGTCA AAACTACAAC ATATTTCAGT 1620 

TGGAAAATTT GTATGCAGTA ATCAGCCAAT GTATTTATCG GCATCGCAAG GACCATGATA 1680 

k ATGGAdCAAG AGGTAGAAAA CTTCAGTTGT TCCAGATGAT 1740 

TTATATT CAGTTCCTAT TTAAGTCATT TTTGTCATGT 1800 

T GATGTAGTAT GAAACCCTGC ATCTTTAAGG AAAAGATTAA AATAGTAAAA 1860 



ATAACTGATA AATA 



I I I ' I I 

MDLSSVISKI DIiHKYLTVKD YLRDIDLICS KALEYNPDRD PGDRLIRHRA C 
IKEELDEDFE QLCEEIQESR KKRGCSSSKY APSTYHVMPK QNSTLVGDKR S 
TPSTPVACST PAQLKRKIRK KSNWYLGTIK KRRKISQAKD DSQNAIDHKI ESDTEETQDT 
SVDHNETOIT GBSSVBENBK QQHASBSKLE LRTOrSNTari BNBLBDSRKT TACTBLRDKI 
AOfGDASSSQ IIHISDENBG KBMCVLRMTR ARRSQVBQQQ LITVBKftlAI liSQPTPSLW 
DHERLKNLIiK TWKESQHYH IFQLEKLYAV I5QCIYRHRK DHDKTSLIQK HBQEVENFSC 



1 1 I I 1 I 

TATATGTGAC CTTTTTAAAA AATGAGCTGT AAGCAGTCTC CCAGACAGTA GCTCAGCCTC 

CAGAACTCTC TTTCTGCftTA GTTGAAGACC CCTCTTCACA C!AAGATGGTA GCAACAAATC 

ATAGGTGCAA TTGCACCRAA TTCACAGRAG ATCAATTGAA AATCCTCATC AATACCTTCA 

CTCSiAAAACC TTACCCAGGT TATGCTACCA AACKAAAACT TGCTTTAGCA ATCKATGCAG 

AAGAGtCCAG AATCCAGATT TG6TTTCAGA ATCAAAGAGC TAGGCATGGA TTCCAGRAAA 

C3VCCAGAACC TGACTTTAGA TTTAAGCCAC AGCCATGGAC AAQATTAACC TQGTGTGGAG 

TTTCAAAATA GAGAAGCCAG ATGGTGTTGT ACCACCTATA GCACCTTTCA ATTACACACa 

ATCATCCATG CATTTATGAA AAACCCATAC CCTGGGATTG ATTCXGGAGA ACAACTTGCT 
GAAGAAATTG GTGCTTCAGA GTCAAGAGTC CAAATTTGGT TCCRAAATCA AAGATCIAGA 

TTTCATCTCC AGAGAAAAAG AGAACCTGTT ATGTCCITAG AATGAGAAGA CCAGAGAAGA 
CCAGGGGCAA GGTTTCTiSAG GGACITCAAG GTACAGftAGA TACACAAAGT GGCACCAGCC 

TCACTAGCAC TCTCATTTCT CAAGAGCCAG AACATGGTGA ATACaGTCAA GTTCAGTGTA 
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a TCAATTTGGG CCCCAAftTCT CTCTCACAGT CTTCCTGGGA GTCTATTCTT 
G TGCAaGCTflA GCCTTCTGAA GATGGTAAAG AACTTGGCCG GGTGTGGTGG 
CTCATGCCTG TAATCCCAGC ACTTTAGGAG GCTGAGGCTG GAAGATTGCT T 



25 

30 



45 



3 GTACAGGAGG CACCACACTA CCCTGTTGAC 

ACAGCCTGGA TCCAGAGTTC A 

ATTTTTCAAT CATTGCAGTA ATTATTGATT TGGACAAAAA T 
AAAGTGACGT TTCTCTGCCT ATGGAGTGGT CATTCTTITA TTCCTTTAG 
ATTTTCTTTT ACTTAAAAAA ACTTATAGTT TGATGAAGAG TGAGATATAT ACCTCATCTC 
10 AAAGAATCTT CACACACACA CTTATTAATT ACAAAAGGAA AATCAGTAAT TTTGCAGTGG 
AGACATATGG CCAACTCCAC CTTACCCAAG TGGCTGAAAG TCACTGCACC AGTAATGGCA 
CAAACCAATG TGAGATGATT CCTGATATGA TACACTAAAA AGGGCACTGT CTCTTCTGCA 
TGTTGCAGAC AAAAAGTGGG TAAGCTGACA CTGAAACTAA TAATTAGGCA ATGTCAAGCA 
AATACAAATT CAAGTTGACA GTCTGCAAAG TAACATCC31T GTACTCITCA ACAATGGATC 
15 GACCCTAGCT ACTCAGGAGG CTGAGGTGGA ATAATTGTTT GftGGCCaGGA GTTCCAGATC 
AGCCTGGGCA ACATCATGCG ACCCCATCTC TAAAAACATC TTTTTAftAAA TGAGCCAGGT 
GTGOTASCAT GCAOCCGTAG TCTCAGCTAC TCAGGAGCCT GSGGCSGGAG GATGGTTTCA 
ACATAGGAGA TCGAGGCTGC TGTGAGCTAT GATOSTGCTA CTCCACTCCA GCCTGGGTGA 
CACSGCRRGT TCCTGTTTCC AfiACMCAAC AAGRAAACAA AACAAAACAA AACAAAARAT 
2U - AGATAGAATA GTGACAATAA AAATGQAGAA AAACSTAGGCT GACTCAGGAA ATGCTTAGAA 
AGTACAGCCA TACCTCAAAG ATATTGTAGA TTTGATTOSA GACCACEACA ATAAAGCAGA 
TATTGCTACA AAGTGAGTCA CaCaAATTGT TTTGTTTCCT TGTGAATATG AAGTTATATT 2040 
GGCTGGGTGT GATGGCTCaT GCCTATAATC CXaGTAOTTT AGGAGACGGA GGCGGGAGGG 2100 
AT 

' TAAAGTGTGA AATGGOGTTA 2220 
AT TTTAAAACTC TTGATGCTGG CTGGGTTCEG TGGCTCaTAC CTGTAATCCC 2280 
ATCACTTTGG GAGGCCaAGA CAGGTTGATT ACTTGAATTC AGGAGTTCAA GACCAGCCTG 2340 
GACAACATGG CAAAACACBT CTTTAAAAAA AGftAAAGAAA AAAGAftftAAC AGAARGAAAA 2400 
AGAAGAAAAA CTACTTGCTG CECTTACTTG AAGCTCAATT ATTTAAAAC 



1 11 21 31 

35 I 1 I I I I 

2j,j.j„rYj.j.jj .YYmTTTTi: tagtagagac agggtttcac catgttagcc agg 
CGATCTCCTG ACCTCATGAT CTTCCTGCTT TGGCCTCCCA AAGTGCTGCG ATTACAGGCG 

. TCCAGATATA GGCOCATCAT AGACATCACA CAAGCGTGTA CTTCATMTC C7GGTGAATA 

40 CAGAAGTTTC CTGGACTCCT TGATGAGCTA CTGCTTTOGC TCCTATATCA GTGTTTTCAG 
T TTGTGATTGT GTTTCTGACT TTCIGTAGGC AGAAAAAAAC TTTCATTTTT 
TA AATGTAAGCG CTAATTCTTA TATTAAACTG TTTATTTCTA 



TGA ACCAAACCAA GAGCATAAGG AATGATAACC 
TTCAAAACTG ATTAAATTAG AGATCAATAA ATGGAGCTGT TTTAATTCTA TTATTCTTCT 
TTCATAGATT A 



GGCACGAGAA GAOGCCACaT CCCCTATTAT AGARGAGCTA ATAAATTTCC ATGATCACAC 
ACTAATAATT GTTTTCCTAA TTAGCTCCTT AGTCCTCTAT ATCATCICGC TAATATTAAC 
AACAAAACTA AC3«aiTACAA GCACAATAGA TGCaCAAGRA GTTGAAACCa TTTGAACTAT 
TCIACCAGCT GTAATCCTTA TCATAATTGC TCTCCCCTCT CTACGCATTC TATATATAAT 
AGAOGAAATC AACAACCCCG TATTAACCGT TAAAACCATA GGGCACCAAT GATACTGAAG 
CTAOGAATAT ACTGRCTATG AAGACCTATG CTTTGATTCA TATATAATCC CAACAAACBA 
CCTAAAACCT GGTGAACTAC GACTGCTAGA AGTTGATAAC CGAGTCGTTC TGCCAATAGA 
ACTTCCAATC CGTATATTAA TTTCATCTGA AGAOSTCCrC CACTCATGAG CAGTCCCCTC 
CCTAGGACTT AAAACTGATG CCATCCCAGG CCGACTAART CCAGCACAGT ACATCAACCG 
ACCAGGGTTA TTCTATGGCC AATGTCTGAA TTTGTGGTCT TACCATAGCT TTTTGCCATT 
GTCCTAGAAT GGGTCCCTAA AATATTTCGG NACKsGTCTG 



Seq ID NO: 18 DNA s 



GTGTACATCA GAGCAAAAAT ACAGAGTATT TATTCATTTC TTCCCACTAG AGGC-ACACAC 
A CAGACAAATG AATCATCAGT TGTCAGGAGT TGCCTTTGGA GAATGATCAA 
T TTCAGGGGTT GGAAATTGAT ACCAGGGTCC ATCACCTCGG GCACGCATCA 
GCCTTCGAAC TTCCTGCTCC TTTAACCGTA ACTCAGCCTT TTCAGATTO^ ATCTGGAGGA 
TAGCCAGGGT TTTCTCGTAG TTCTTTTCAG GGCCATCATA GAAATTCCGG GCGATCCRTC 
TTGATATCGG ATGCTTGTAA TACTCCCAGT GTTCAGGGAT GTAGCCTTCT GGGATTTCTG 
CAAGCTCGGC TTCACCAATA AATATGTTCA CCAGTGTTAT GCCAATTATA ACTGGGATCC 
CAGTCAftCAT AAGGTAGAAT TTCATTAACC TCAAGAAGCG AGCGTCATAG TATAAAGAAG 
GCTTGACGAC AAACA6TCTC TTGCCATGTC CCCACTGTGC OGCACAGGAG CGACAGTCTT 
CGGAAANTCC GCGTGAGAAA ACTTCCGACT CCGAGTCTAG GACCAGCGCG GCGGCAAGAC 
CACGCTGTCA GOSCGGAGAC CGAANCCGCT GCAGCAGCTC ATGGCCGCCA TGG 
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TAGTCCSGTN AATTACTTTA ATTTCGCTTT TCCATAATAC TGGTATTCCA TAGAAGSiAftA 
TCTTTTATTA ATATTCTATA CTACTACATC QSACAOCAGA TGACTAAAST TTGCAAIGGT 
CCAAAATTCT GTAAACCCAT TAAATGCAAT TCaTACTTTA TTTTGGCAGT ATTCATTTCA 
TCATTACTTT ATTTGGATGC TAAOSCAAGT ACTTCEAAGG AAAAGCTGTC ATATAATTAC 

TTTAGTCAAG CATTCAGTAG AGGCAATAAT CAAACCTCTA TCCCAACATT TTACACITGT 
AACAGAATGA AGGATGAGGT ACAACATACA TTTTTGGCAA TTTACTATTA AGGGCC^TAA 
TCATTTTAGG GGCGCTTAGG GCCCATATAT ATATATATAT ATTTTTGGAC A 




GCGCGCCCGC CCGCCGTCGC TGCCCTTCCT GTTGGGATTA TCTTCTGCTC C 

2 CGCGGTCGAA GCCGCCTCTA GGCTTCAGCG GCTCGGACTC CTTGGCAGCC 

3 CTACCTGGGC CTCGTAGCTG GGAGACCCTT GGGCGAGACC ATGAGGAAAT 
TCAACATCAG GAAGGTGCTG GACGGCCTGA CCGCAGGCTC GTCCTCGGCC TCGCAACAGC 
AGCAACAGCA GCAGCACCCG CCTGGGAACC GGGAGCCCGA GATCCAGGAG ACECTCCAGT 
CCGAGCACTT CCAACTCTGC AAGACTGTTC GCCATGGATT TCCCIATCAG COCTCAGCCC 
TGGCCTTTGA TCCCGTTCAG AAGATCCTGG CGGTAGC3AAC CCAGACTGGT GCTTTAAGGC 
TCTTTGGTCG TCCAGGGGTG GAATGTTATT GCCAGCaCGA CAGOSGAGOS GC3«3TGATTC 
AACTCCAGTT CCTGATTAAT GAGGGAGCCC TTGTGAGTGC CTTGGCTGAT GACACCITAC 
ACTTGTGGAA TTTACGTCAG AAAAGGCCTG CTGTGCTACA TTCACTCAAA TTTTGCAGAG 
AAAGGGTTAC ATTTTGCCAT CTGCCTTTCC AGAGTAAGTG GCTCTATGTG GGCACGGAAC 
GAGGTAATAT ACATATTGTC AATGTGGAGT CCTTCACaCT CTCAGGCTAC GTCATTATGT 
GGAATAAAGC CATCGAACTG TCATCTAAAT CTCACCCAGG ACOTGTTGTC CATATAAGTG 
ATAATCCCAT GGACGAGGGG AAGCTTCTGA TTGGCTTTGA ATCTGGAACA GTAGTCTTAT 
GGGACCTTAA GTCAAAGAAG GCTGACTACSi GATACACTTA CGAOSAGGCT ATTCACTCTG 
TGGCTTGGCA TCATGAAGGA AAACAGTTTA TTTGCAGTCA TTCTGATGGT ACATTGACCA 
TATGGAATGT GAGGTCCCCT ACTAAACCTG TACAGACCAT CACTCCTCAC GGAAAAC3«3T 
TAAAGGATGG GAAGAAACCC GAGCCGTGCA A6CCTATCCT CAAGGTGGAG TTCAA6ACAA 
CSAGATCBGG GGAACCTTTT ATTATTTTGT CGGGAGGCTT ATCATATGAT ACOGTGGGAA 
GAAGACCTTG CTTAACAGTG ATGCATGGQA AAAGCAOQGC AGTGCIGGAA ATQGACTATT 
CAATTGTCGA CTTTCTCACA CTCTGTGAAA OSCCATATCC AAATGATTTT CAGGAGCCGT 
ATGCTGTGGT TGTTCTCCTG GAGAAGGATT TAGTGCTGAT AGACCTGGCA CAGAATGGAT 
ACCCTATATT TGAGAATCCC TACCCTTTGA GTATACACGA GTCCCCTGTT ACATGTTGTG 
AATATTTTGC TGATTGTCCT GTGGACCTTA TTCCTGCACT TTATTCTGTT GGAGCTAGAC 
AGAAAOSTCA AGGTTACAGC AAAAAGGAAT GGCCCATCAA TGGTGGTAAT TG3GGCTTGG 
GTGCTCAAAG TTACCCAGAA ATAATTATTA CAGGGCATGC TGATGGCTCA ATTAAATTCT 

C TGCAATAACT CTACAAGTAC TGTATAAATT AAAAACATCT AA-i^GTATTTG 
CAAAGAT GACAGACAGA ACRCOSACAT TGTAGATGAA GATCCATATG 
CCATTCAGAT CATCTCCTGG TGCCCAGAGA GCAGAATGCT GTGCATAGCC GGAGTGTCGG 
CTCATGTCAT CATOTATAGA TTCAGCAAGC AGGAAGTGGT TACAGAAGTC ATCCCGATGC 

CCCCTTTGTC CACTCCCGTG GGCAGCTCCA CCTCTCAGCC CATCCCCCCT CAGTCTCATC 
CGTCTACCAG CAGCAGCTCA TCGGACGGGC TTCGAGATAA TGTACJCGTGT TTAAAAGTTA 
ARAACTCACC ACTTAAACAG TCTCCCGGCT ATCAAACAGA GCTAGTCATC CAGTTGGTGT 



AGAGAACCGA 

CIGTCGTCCC AGAGGATCGC TGCAARTCTC CGACTTCCGC AAAGATGTCA AGGAAATTAA 2520 

GCTTGCCAAC TGATCTAARG CCTGATTTAG ATGTGAAAGA CAATTCCTTC AGCAGATCTC 2580 

GGAGTTCAAG TGTGACCAGC ATTGACAAftG AGTCCCGGGA AGCCATTTCT GCTCTTCATT 2 640 

TCTGTGAGAC TTTCACAAGG AAGGCAGACT CCTCCCCCTC CCCGTGCCTG TGGGTGGGAA 2700 

CCACAGTGGG AACTGCCTTT GTCATCACGC TGAATCTCCC CCTGGQGCCT GA3CAGAGAC 2760 

TGCTTCAGCC AGTGATTGTG TCTCCAAGCG GTACTATATT 3AGGTTAAAA GGTGOjATCT 2 820 

TGAGAATGGC ATTTCTGGAT GCCGCGGGCT GCTTAATGCC ACCTGCATAC GA^iCCCTGGA 2880 

CAGAGCACAA CGTTCCTGAA GAAAAAGACG AAAAGGAGAA ATTGAAAAAG OSGCGACCTG 2 940 

TCTCAGTGTC CCCCTCCTCT TCTCAGGAAA TTAGTGAAAA CCAGTACGCA GTGATATGTT 3000 

CTGAAAAGCA AGCAAAGGTC ATCTCaCTGC CAACCCAGAA CTGTGCATAC AAGCAGAACA 3060 

TCACTGAGAC GTCCTTCGTG CTCCGTGGAG ACaTTGTCGC CCTGAGTAAC AGTGTCTGCC 3120 

TCX3CCTGCTT CTGTGCCRAC GGCCRCATTA TGRCTTTCAG ITTGOCGAGC TTGAGGCCTC 3180 

TGCTGC3ATGT CTACTACCTG CCCCTTACCA ACRTGCGGAT AGCCAEGACa TTCIGCTICG 3240 

CCAACRGTGG GCAAGCCTTA TACCTTGTTT CACCTACCGA AATCCAGAGA CTCaCCTACA 3300 

GTCAGGAGAC GTGTGAARAC CTTCAGGAGA TGCTTGGTGA GCTCTTCACG CCTGTAGftAA 3360 

CACCAGAAGC ACCAAACAGA GGGTTCTTta AAGGCTTATT TGGAGGTGGI GCACaATCTC 3420 

TTGATAGAGA AGAACTGTTT GGAGAGTCAT CCTOGGGAAA GGCGTCAAGG AGCCTTGCaC 3480 

AGCACATCCC GGGTCCTGGC GGGATCGAAG GTGTGAAGGG AGCCGCGTCG GGAGTGGTGG 3540 

GAGAACTGGC CCGAGCCAGG CTGGCCCTCG ACGAAAGAGG ACAGAAGCTC AGCBACTTGG 3600 

AAGAGAGGAC TGCAGCCATG ATGTCCAGTG CAGACTOGTT TTCCAAACAT GCTCATGAGA 3660 

TGATGCTGAA ATACAAAGAT AAGAAGTGGT ACCAGTTCTG ACAASIAGCA CTCAGTAAGT 3720 

CCAGCTTCAA CCAGAAGGAA AAAGACGTTT CCTTGTTGAG GTCACTGATG TATTTGGGAA 3780 
AGATAACATA AAAGGGATGC ACACTGCTGA CAGCGTCTTT CCCAGCACAA TCRTGCACTT 

Seq ID NO: 21 Protein sequence 
Protein Aoeession #> AflD047S6 
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MRKFNIEKVL DGLTAGSSSA SQQQQQQQHP PGNEEPEIQE TLQSEHFQLC iCTVRHGFPYQ 
PSALAFDPVQ KILAVGTQTG JUjRLFGRPGV ECYCQHDSGA AVIQLQFL 
DTLHLWNLRQ KRPAVLHSLK FCRERVTFCH LPFQSKWLW GTERGNIH 
VIMWNKAIEL SSKSHPGPW HISDNPtlDEG KLLIGFESGT WLWDLKSKK A 
3 KQFICSHSDG TLTIWNVRSP TKPVQTITPH G 
gGGLSYD TVGREPCLTV MHGKSTAVLE M 
QEPYAVWLL EKDLVLIDLA QNGYPIFENP YPLSIHESPV TCCEYFADCP VDLIPALYSV 42 0 
GfiRQKHQGYS KKEWPINGGN WGLGAQSYPE IIITGHADGS IKFWDASAIT LQVLYKLKTS 480 
KVFEKSENKD DRQNTDIVDE DPYAIQIISW CPESRMLCIA GVSAHVIIYR PSKQEWTEV 540 
IPMLEVRLLY EINDVETPEG EQPPPLSTPV GSSTSQPIPP QSHPSTSSSS SDGLRDNVPC SOO 
LKVKNSPLKQ SPGYQTELVI QLVWVGGEPP QQITSLAMS SYGLWFGN'S NGIAMVDYLQ ' S60 
KAVLLNLSTI ELYGSNDPYR REPRSPRKSR QPSGAGLCDI TEGTWPEDR CKSPTSAKMS 72 0 
DVKDNSF SRSRSSSVTS IDKESREAIS ALHFCETFTE KADSSPSPCL 780 
? VITLNLPLGP EQRLLQPVrV SPSGTILRLK GAILRMAFLD AAGCLMPPAY 840 
EFWTEHNVPE EKDEKEKLKK RKFVSVSPSS SQEISEHQYA VICSEKQAKV ISLPTQNCAX 900 
KQNITETSFV LRGDIVALSN SVCLACPCSUf GHIMTPSLPS LRPLIBVYYL PLTNMRIART 960 
PCFANSGQAL YLVSPTEIQH LTYSQETCEN LQEMLGELFT PVETPEAPNE GFFKGLFGGG 1020 
AQSI.DREELF GESSSGKASR SLAQHIPGPG GIESVKQAAS GWGELARAR LALDERQQKL 1080 
SDLEERTAAM MSSADSFSKH AHEMMLKYKD KKHYQF 



25 

1 11 21 31 41 51 

I I I I I I 

TCECATCGGG TGAACCGTGG TCTTGTTOCG TCCGCCCACA ATCGCTCTCC AGCTTTGACG 

30 GCCCaSGCAA AGCCTGGCTC GTTCACAGCT CTCTCGCACC TCCIGGAGCT TCAGCTTCTI 
CCGTTGCAGA GARGCTTTAT GGGCCAATTC GTTCGGCATC CCGGGGGCAG GTGCGCGGTG 
CGCGGGGAAG AAGAGGATTT GACTGCGGTT CTCCACCCCC GGCGCCCMC CTCCACCCCG 
GTGCGCGCGC TCTTCCAGGC TCCTGCTGGT CCCACTTGCC AGGAGTTAGG TCTCAGGTCA 
GCCTGAGCTC CTGAGACGCC CAGGCCCGGA AAGACACGTA GGGGAAAOCA TCTGCTCACT 

35 TCTGTCCTGT CCGGAAGGGA TCCCTTTCTG ACGGGAAAGA AAGGCGCTAA ACAAGCACTG 
GCCTTGAGAT AAGCAATGCT GAAGCACTTG CAGCTCACCT ATTACCATAA ACTGACTGAG 
CCCTCCCTAC ACAAGCCGTA ACTACTGCTT TGATTGGACA AGAGACTGAT TTCAGTAGTT 
TTCTCTTGAT AAGAGACCAC TGGCCGTGGG CGGGTTCTGG ACAGTTTACA GAAGCTATGC 
ACTTGATTGC CTTTGTGTCC CTGCTTCACC TTTTGAAGCA TAGGGCCTAA TTATAATGTA 

40 TTTAAATGTT GTCTCCACCC CAAAGTGAAC AIGGGTTGCA TGTAACAGGC ATGTTTACTC 
AGCATGCATG CAGCAGGATC CCTTCACAAA TATTCAGAGC TCCOCCTATT CCCTGTTGAA 
TATGTATATG TGGCCAGCCA GATCAACGTA AATCACTATT CGCCCTCCCC TCCCTGGAAA 
CCTACTTTTC GGGTTTCAGC AGGAAGCTAT G 
TTCGGGCTTG ATAACCCCTT TATAAAAAAA TAAAATCTCC T 

45 CCACACCACC GGCCCGCAAC TATTGGGGGG G 

TTCATGCACA TTGTTAAGGA GACAGGTGCC CCCAAGCAGG CGGACATCAC O 
GCTTGAGCAT GCCGAAGACG CGAGCGACTC ATAGAACACG ACGACGCTCG CAAGGCACTA 
AGCATAGCTA CTACCACTCG TCGAAGAGTC ATACAC3«3AT TTCTATTGGC GA 

50 Seq ID NO: 2 3 DNA secjuenoe 

Nucleic Acid Accession #: CAT cluster 

I " f f f f 

JZ) CTATGAATCT CGGAAATTAC TCSUiACCATC AGCCTCTGCA AGAAGCAAAG TGGACGGCCG 
GGCGCGGTGG CTCACTCCTG GAATCCCAGC ACITTGGGAG CCCGAGGTGG CGGGATCACG 
AGGTCRGGAG ATCGAGACTG TTCTGGCTAA ACCAGTGAAA CCCOCTCTCT ACTAAAAAAA 
TAAGAAflAGC GAAGTGCATC TCCCATAAAC GAGGTACTGC AGGAAGAAAG CAGAAAATGA 
GACCCGAGTA CACACATGCA OGCGGGCGCC GCACACaCAC ACCAGAAGAA ATGAACCAAG 
60 AGGAAACkSAA ACATTTICAA ATAAGCATIT GGAGATGGGA AAAACACCTT GAAACAGAAA 
TTCATAAAGT ACAGAATTTT TTTTTAAGTT AAAAAAGGAA CAATAATAGA CAGAAAATGA 
ATGAAfiAATT AAATGTCATA TCAGAAGTGA AGATAAATTA AAAGTGGTCA AAGGAGAAGA 
GATCTAAATG CAAACTTAAG AAGGGGCAAT TTTTTTTTTT TTTTTTTTTG AGACGCAGCC 
TCACTCTGTC GC 



Seq ID N 
Nucleic 



C GAAGCCGCCG CCCGGAGCTG CCCTTTCCTC TTCGQTG.^AG TTT7TAAAAG 
CTGCTAAAGA CTCGGAGGAA GCAAGGAAAG TGCCTGGTAG GACTGACGGC TGCCTTTGTC 
CTCCTCCTCT CCACCCCGCC TCCCCCCACC CTGCCTTCCC C 
CCGCAGCTGC CTCAGTCGGC TACTCTCAGC CAACCCCCCT CACCACCCTT C 
GCCCCCCCGC CCCCGTCGGC CCAGCGCTGC CAGCCCGAGT TTGCAGAGAG GTAACTCCCT 
TTGGCTGCGA GCGGGCGAGC TAGCTGCACA TTGCAAAGAA GGCTCTTAGG AGCCAGGCGA 
CTGGGGAGCG GCTTCAGC&C TGCAGCCACG ACCCGCCTGG TTAGAATTCX: GGCGGAGAGA 
ACCCTCTGTT TTCCCCCACT CTCTCTCCAC CTCCTCCTGC CTTCCCCACC CCGAGTGCGG 
AGCAGAGATC AAAAGATGAA AAGGCAGTCA GGTCTTCAGT AGCCAAAAAA CAAAACAAAC 
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AflAMCAAAA AAGCCGAAAT AAASGAflSM. GATAATAACT CAGTTCTTAT TTGCACCTAC 780 
TTCAGTG6AC ACTGAATTTG GAAGGTGGAG GATTTTCTTT TTTTCTTTTA AGATCK3GGC B40 
ATCTTTTGAA TCTACCCTTC AAGTATTAAG AGACAGACTG TGAGCCTAGC AGGGC3«3ATC 900 
TTGTCCACCB TGIGTCTTCT TCTGCACGAG ACTTTTGAGGC TGTCAGAGCXS CITTTIGCGT 960 
GGTTGCTCCC GCSU^GTTTCC TTCTCTGGAG CTTCCCGCAG GTGGGCAGCT AGCTGCAGCG 1020 
ACTACCGCAT CATCACaGCC TGTTGAACTC TTCTGAGCAA GAGARGGGGA GGCBGGGTAA 1080 
GGGAAGTAGG TGGAAGATTC AGCCSUIGCTC AAGGATGGAA GTGCAGTTAG Gi 
GGTCTACCCT CGGCCGCCGT CCAAGACCTA CCGAGGAGCT TTCCAGJ 
CGTGCGCGAA GTGATCCAGA ACCCGGGCCC CAGGCACCCA GAGGCCGQGA GCGCAGCACC 
TCCCGGCGCC AGTTTGCTGC TGCTGCAGCA GCAGCAGCAG CAGCAGC 
GCAGCAGCAG CAGCAGCAGC AGCAGCAAGA GACTAGCCCC AGGCAGCAGC Ai 
GGGTGAGGAT GGTTCTCCCC AAGCCCATCG TAGAGGCCCC ACAGGCTACC TGGTCCTGGA 
TGAGGAACAG CAACCTTCAC AGCCGCAGTC GGCCCTGGAG TGCCACCCCG AGAGAGGTTG 
CGTCCCAGAG CCTGGAGCCG CCGTGGCCGC CAGCAAGGGG CTGCCGCAGC AGCTGCCAGC 
ACCTCCGGAC GAGGATGACT CAGCTGCCCC ATCCACGTTG TCCCTGCTGG GCCCCACTTT 
CCCCGGCTTA AGCAGCTGCT CCGCTGACCT TAAAGACATC CTGAGCGAGG C 



GAGGGAGGCC TCGGGGGCTC CCACTTCCTC CAAGGACAAT TACTTAGGGG GCACTTCGAC 1800 

GGAGGCGTTG GAGCATCTGA GTCCAGGGGA ACAGCTTCGG GGGGATTGCA TGTACGCCCC 1920 

ACTTTTGGGA GTTCCACCCG CTGTGCGTCC CACTCCTTGT GCCCCATTGG CCGAATGCAA 1980 

AGGTTCTCTG CTAGACGACA GCGCAGGCAA GAGCACTGAA GATACTGCTG AGTATTCCCC 2040 

TTTCAAGGGA GGTTACACCA AAGGGCTAGA AGGCGAGAGC CTAGGCTGCT CTGGCAGOSC 2100 

TGCAGCAGGG AGCTCCGGGA CACTTGAACT GCOSTCTACC CTGTCTCTCT ACAAGTCCGG 2160 

AGCACTGGAC GAGGCAGCTG CGTACCAGAG TOSCGACTAC TACAACTTTC CACTGGCTCT 2220 

GGCCGGACCG CCGCCCCCTC CGCOSCCTCC CCATCCCCAC GCTOSCATCa AGCTGGAGAA 2280 

CCCGCTGGAC TACGGCAGCG CCTGGGCGGC TGCGGOSGCG CAGTGCCGCT ATGGGGACCT 2340 

GGCGAGCCTG CATGGCGCGG GTGCAGCGGG ACCCGGTTCT GGGTCaCCCT CAGCCGCCGC 2400 

TTCCTCATCC TGGCACACTC TCTTCACAGC CGAAGAAGGC CAGTTGTATG GACOGTGTGG 2460 

TGGTGGTGGG GGTGGTGGCG GCGGOSGCGG CGGCGGOGGC GGOGGOSGCG GCGGCGGCGG 2520 

OSGCGGOSGC GAGGCGGGAG CTGTAGCCCC CTACGGCTAC ACTCGGCCCC CTCAGGGGCT 2580 

GGOGGGCOMS GAAAGCGACT TCACCGCaLCC TGATGTGTGG TACCCTGGOG GCSTGGTGAG 2640 

CAGAGTGCCC TATCCCAGTC CCACTTGTGT CAAAAGOGAA ATGGGCCCCT GGATGGATAG 2700 

CTACTCCGGA CCTTACGGGG ACATGCGTTT GGSGACTGCC SGGGACCATG TTTTGCCCAT 2760 

TGACTATTAC TTTCCaCCCC AGAAGACCTG CCTGATCTGT GGAGATGAAG CTTCTGGGTG 2820 

TCACTATGGA GCTCTCACAT GTGGAAGCTG CAAGGTCTTC TTCAAAAGAG CCGCTGAAGG 2880 

GAAACAGAAG TACCTGTGCG CCAGCAGAAA TGATTGCaCT ATTGATAAAT TCCGAAGGAA 2940 

AAATTGTCCA TCTTGTCGTC TTCGGAAATG TTATGAAGCA GGGATGACTC TGGGAGCCCG 3000 

GAAGCTGAAG AAACTTGGTA ATCTGAAACT ACAGGAGGAA GGAGAGGCTT CCAGCACCAC 3060 

CAGCCCCACT GAGGAGACAA CCCAQAAGCI GACAGTSICA CACAITGAAG GCTATGAATG 3120 

TCAGCCCATC TTTCTGAATG TCCTGGAAGC CATTGAGCCA GGTGTAGTGT GTGCTGGACA 3180 

CGACAACAAC CAGCCCGACT CCTTTGCAGC CTTGCTCTCT AGCCTCAATG AACTGGGAGA 3240 

GAGACAGCTT GTACACGTGG TCAAGTGGGC CARGGCCTTG CCTGGCTTCC GCAACTTACA 3300 
CGTGGACGAC CAGATGGCTG TCATTCAGTA CTCCTGGR 
GGGCTGGCGA TCCTTCACCA ATGTCAACTC CAGGATGCTC T. 

TTTCAATGAG TACCGCATGC ACAAGTCCGG GATGTACAGC CAGTGTGTCC GAATGAGGCA 3430 

CCTCTCTCAA GAGTTTGGAT GGCTCCAAAT CACCCCCCAG 3AATTCCTGT GCATGAfiAGC 3540 
ACIGCTACTC TTCAGCA: 
ACTTCGAATG AACTACATCA A 

CACATCCTGC TCAAGACGCT TCTACCAGCT CACCAAGCTC CTGGACTCCG TGCAGCCTAT 372 0 
TGCGAGAGAG C 



A GATGTCTTCT G 

G GGAATTTCCT CTATTGATGT A 

TATTTGCTGG GCTTTTTTTT TCTCTTTCTC TCCITTCTTT TTCTTCTTCC CTCCCTATCT 4080 

AACCCTCCCA TGGCACCTTC AGACTTTGCT TCCCATTGTG GCTCCTATCT GTGTTTTGAA 4140 

TGGTGTTGTA TGCCTTTAAA TCTGTGATGA TCCTCATATG GCCCAGTGTC AAGTTGTGCT 4200 

TGTTTACAGC ACTACTCTGT GCCAGCCACA CAAACGTTTA CTTATCTTAT GCCACGGGAA 4260 

G CTAAGATTAT CTGGGGAAAT CAAAACAAAA AACAAGCAAA CAAAAAAAAA 4320 



1 11 21 31 41 51 

I I I I I I 

MBVQLGLGEV YPRPPSKTYE GAPQNLPQEW EBVIQNPGPE HPBAAEftAPP GASLLLLQQQ 
QQQQQQQQQQ QQQQQQQQBT SPRQQQQQQG BDGSPQAHEE GPTGYLVLDB EQQPSQPQSA 
LECHPERGCV PEPGAAVAAS KGLPQQLPAP PDEDDSAAPS TLSLLGPTFP GL5SCSADLK 
DILSEASTHQ LLQQQQQEAV SEGSSSGRAK EASGAPTSSK DNYLGGTSTI SDNAKELCKA 
VSVSMGLGVE ALEHLSPGEQ LRGDCMYAPL LGVPPAVRPT PCAPLAECKG SLLDDSAGKS 
TEDTAEYSPF KGGYTKGIUEG ESLGCSGSAA AGSSGTLELP STLSLYKSGA LDEAAAYQSR 
DYYHFPLALA GPPPPPPPPH PKARIKLEHP LDYGSAWAAA AAQCRYGDLA SLHGA6AAGP 
GSGSPSAAAS SSWHTLFTAE EGQLYGPCGG GGGGGGGGGG GGGGGGGGGG GGEAGAVAPY 
GYTRPPQGLA GQESDFTAPD VWYFGGMVSR VPYPSPTCVK 5EMGPHMDSY SGPYGDMRLE 
TARDHVLPID YYPPPQKTCL ICGDEiASGCH YGALTCGSCK VPPKHAAEGK QKYLCASHND 
CTIDKFRRKK CPSCRLRKCY EAGMTLGARK LKKLGHLKLQ EEGEASSTTS PTEETTQKLT 
VSHIEGYECQ PIPIJJVLEAI EPGWCAGHD HMQPDSFAAL LSSIjNELGER QLVHWKWAK 
ALPGFSNUIV DDQ^4AVIQYS HMGLMVFAMQ HRSFTNVNSR MLYFAPDLVF HEYRNHKSRM 
YSQCVRMRHL SQEPGWLQIT PQEPLCMKAL LLPSIIPVDG LKNQKPPDEL RMNYIKELDR 
IIACKRKNPT SCSRRFYQLT KLLDSVQPIA RELHQPTFDL LIKSHMVSVD PPEMMAEIIS 
VQVPKILSGK VKPIYFHTQ 
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TATCTCATCT CAATCCTCTA AATAACCATG AAAGTTGATG ATTATCTCAT GGTACAGATG 
GGAGGCTAAG AGTGTTTAAT TTTCCCCAAG TTCCAGTGCT AGTAAGTGTT GNNNNNNNMN 
NNTGAACCTG TGTTAATGGT GTTTCTAGTC GATGCTGTTA TCTGTTGCAC CACATTTTGA 



50 



CTCTCTCTCG CTCCTCTGTA ACAATTGGAG AAACAGAGTT CTAAC»ATAI TAAftATCAGC 
A AGCAGTTTCT 



Seq ID NO: 27 DNA sequence 

Nucleic Acid Accession #; NM_006551.2 

Coding sequence: 64.. 336 

I i' r r r r 

AATTCTAGAA GTCCAAATCA CTCATTGTTT GTGAAAGCTG AGCTCRCAGC AAAACAAGCC 

ACCAT3AAGC TGTCGGTGTG TCTCCTGCTG GTCRCGCIGG CCCTCTGCTS CTACCAGGCC 

AATGCCGAGT TCTGCCCRGC TCTTGTTTCT GAGCTGTTAG ACTTCTTCTT CATTAGTGAA 

CCTCTGTTCA AGTTAAGTCT TGCCAAATTT GATGCCCCTC OSGAAGCTGT TGCAGCCJlAG 

TTAGGAGTGA AGAGATGCAC GGATCAGATG TCCCTTC3U3A AACGAAGCCT CATTGCGGAA 

GTCCTGGTGA AAATATTGAA GAAATGTAST GTGTGACATG TAAAAACTTT CATCCTGGTT 

TCCACTGTCr TTCAATGACA CCCTGATCTT CACTGCAGAA TGTAAAGGTT TC3UICGTCTT 

GCTTTAATAA ATCACTTGCT CTAC 



4U Seq ID NO: 29 DMA Sequence 

Nucleic Acid Accession #; NM_002645.1 
Coding sequence: 1..5061 

45 I r r r r r 

ATGGCTCAGA TATITAGCAA CAGCGGATTT AAAGAATGTC CATTTTCACA TCCGGAACCA 
ACAAGAGCAA AAGATGTGGA CAAAGAAGAA GCATTACAGA TGGAAGCAGA GGCTTTAGCA 
AAACTGCAAA AGGATAGACA AGTGACTGAC AATCAGAGAG GCTTTGAGTT GTCAAGCAGC 
ACCAGAAAAA AAGCACAGGT TTATAAC&AG CAGGATTATG ATCTCATGGT GTTTCCTGAA 
TCAGATTCCC AAAAAAGAGC ATTAGATATT GATGTAGAAA AGCTCACCCft AGCIGAACTT 
GAGAAACTAT TGCTGGATGA CAGTTTCGAG ACTAAAAAAA CACCTGTATT ACCAGTTACT 
CCTATTCTGA GCCCTTCCTT TTCAGCACAG CTCTATTTTA GACCTACTAT TCAGAGAGGA 
CAGTGGCCAC CTGGATTACC TGGGCCTTCC ACTTATGCTT TACCTTCTAT TTATCCTTCT 
ACTTACAGTA AACAGGCTGC ATTCCAAAAT GGCTTCAATC CRAGAATGCC CACITTTCCA 
TCTACAGAAC CTATATATTT AAGTCTTCCG GGACAATCTC CATATTTCTC ATATCCTTTG 
A CACCCTTTCA TCCACAAGGA AGCTTACCTA TCTATCGTCC AGTAGTCAGT 
iACTATT TGACAAAATA GCTAGTACAT CAGAATTTTT AAAAAATGGG 
AAAGCAAGGA CTGATTTGGA GATAACAGAT TCAAAAGTCA GCAATCTACA GGTATCTCCA 
AAGTCTGAGG ATATCAGTAA ATTTGACTGG TTAGACTTGG ATCCTCTAAG TAAGCCTAAG 
60 GTGGATAATG TGGAGGTATT AGACCATGAG GAAGAGAAAA ATGTTTCAAG TTTGCTAGCA 

AAGGATCCTT GGGATGCTGT TCTTCTTGAA GAGAGATCGA CAGCRAATTG TCATCTTGAA 
AGAAAGGTGA ATGGAAAATC CCTTTCTGTG GCAACTGTTA CAAGAAGCCA GTCITTAftAT 
ATTCGaflCAA CTC3W3CTTGC AAAAGCCX^AG GGCC3VTATAT CTOVGAAAGR CCCSUiATGGG 
ACCAGTAGTT TGCCAACTGG AAGTTCTCTT CTTCAAGAAG TTGAAGTACA GAATGAGGAG 
65 ATGGCAGCTT TTTGTCGATC CATTACftAAA TTQAAGACCA AATTTCCATA TAOCAATCAC 

CGCACAAACC CAGGCTATTT GTTAAGTCCa GTCACAGCGC AAAGAAACAT ATGCGGAGAA : 
AATGCTAGTG TGAAGGTCIC CATTGACATT GAAGGATTTC AGCTACCAGT TACTTTTACG : 
TGTGATGTGA GTTCTACTGT AGAAATCATT ATAATGCAAG CCCTTTGCTG GGTACATGAT 
GACTTGAATC AAGTAGATGT TGGCRGCTAT GTTCTAAAAG TTTGTGGTCA AGAGGAAGTG : 
70 CTGCAGAATA ATCATTGOCT TGGAAGTCAT GAGCATATTC AAAACTGTCG AAAATGGGAC 
ACAGAAATTA GACTACAACT CTTGACCTTC AQTGCAATGT GTCAAAATCT GGOCCGAACA 
GCaGAAGATG ATGAAACACC OSTGGRTTTA MiCMACACC TGTATCAAAT AGARAAACCT : 
TGCAAAGAAG CCATGACSAG ACACCCTGTT GASGAACTCT TAGATTCTTA TCACAACCAA : 
_ GTAGAACTGG CTCTTCAAAT TGAAAACCAA CACCGAGCAG TAGATCAAGT AATTAAAGCT 
75 GTAAGAAAAA TCTGTAGTGC TTTAGATGGT GTCGAGACTC TTGOaTTAC AGAATCAGTA 
AAGAAGCTAA AGAGAGCAGT TAATCTTCCA AGGAGTAAAA CTGCTGRTGT GACmCTTTG 
TTTGGAGGAG AAGACACTAG CAGGAGTTCA ACTAGGGGCT CACTTAATCC TGAAAATCCT 
GTTCAAGTAA GCATAAACCA ATTAACTGCA GCAATTTATG ATCTTCTCAG ACTCCATGCA 
3TCCTAC AGACTGTGCC CAAAGTAGCA AGAGTGTCfiA GGAAGCATGG 
G AGCAGCTCCA GTTTACTATT TTTGCTGCTC ATGGAATTTC AAGTAATTGG 2100 
GTATCAAATT ATGAAAAATA CTACTTGATA TGTTCACTGT CTCACAATGG AAAGGATCTT 2160 
TITAAACCTA TTCAATOUiA QAAGGTTGGC ACTTACAAGA ATTTCTTCTA TCTTATTAAA 2220 
TGGGATGAAC TAATCATTTT TCCTATCCAG ATATCACAAT TGCCATTAGA ATCSGTTCTT 2280 
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CMCTTACTC TTTTTGGaAT TTTAAATCRG AGCAGTGGftA GTTCKCCTGA TTCTAATAAG 2340 

CftGAGRAAGG GACCAGAAGC TTTGGGCARR GTTTCTTTAC CTCTTTGTGA CTTTAGftOSG 2400 

TTTTTAACAT GTGGAACTAA ACTTCTATAT CTTTGGRCTT CATCACATAC AAATTCTGTT 2460 

CCTGGAACAG TTACCAAAAA AGGATATGTC ATGGAAAGAA TAGTGCTACa GGTTGATTTT 2520 

5 CCTTCTCCTG CATTTGATAT TATTTATACa ACTCCTCAAG TTGACAGAAG CATTATACAG 2580 

CAACATAACT TAGAAACACT AGAGAATGAT ATAAAAGGGA AACTTCTTGA TATTCTTCAT 2640 

AAAGACTCAT CACTTGGACT TTCTAAAGAA GATAAAGCTT TTTTATGGGA GAAACGTTAT 2700 

TATTGCTTCA AACACCCAAA TTGTCTTCCT AAAATATTAG CAAGCX3CCCC AAACTGGAAA 2760 

TGGGGTAATC TTGCCAAAAC TTACTCATTG CTTCACCRGT GGCCTGCATT GTACCCaCTA 2820 

lU ATTGCATTGG AACTTCTTGA TTCAAAATTT GCTGATCaGG AAGTAAGATC CCTAGCTGTG 28B0 

ACCTGGATTG AGGCCATTAG TGATGATGAG' CTAACAGATC TTCTTCCACA GTTTGTACAA 2940 

GCTTTGAAAT ATGAAATTTA CTTGAATAGT TCATTAGTGC AATTCCTTTT GTCCAGGGCA 3000 

TTGGGAAATA TCCAGATAGC ACACAATTTA TATTGGCTTC TCAAAGATGC CCTGCRTGAT 3060 

GTACAGTTTA GTACCCGATA CGAACATGTT TTGGGTGCTC TCCTGTCAGT AGGAGGAAAA 3120 

15 CGACTTAGAG AAGAACTTCT AAAACAGACG AAACTTGTAC AGCTTTTAGG AGGAGTAGCA 31 BO 

GAAAAAGTAA GGCAGGCTAG TGGATCAGCC AGACAGGTTG TTCTCCAAAG AAGTATGGAA 3240 

CGAGTACAGT CCTTTTTTCA GAAAAATAAA TGCCGTCTCC CTCTCAAGCC AAGTCTAGTG 33 00 

GCAAAAGAAT TAAATATTAA GTCGTGTTCC TTCTTCAGTT CTAATGCTGT CCCCCTAAAA 33 SO 

GTCACAATGG TGAATGCTGA CCCTCTGGGA GAAGAAATTA ATGTCATGTT TAAGGTTGGT 3420 

20 GAAGATCTTC GGCAAGATAT GTTAGCTTTA CAGATGATAA AGATTATGGA TAAGATCTGG 3430 

CTTAAAGAAG GACTA6ATCT GAGGATGGTA ATTTTCAAAT GTCTCTCAAC TGGCAGAGAT 3540 

CGAGGCATGG TGGAGCTGGT TCCTGCTTCC GATACCCTCA GGAAAATCCA AGTGGAATAT 3 600 

GGTGTGACAG GATCCTTTAA AGATAAACCA CTTGCAGAGT GGCTAAGGAA ATACAATCCC 3660 

TCTGAAGAAG AATATGAAAA GGCTTCAGAG AACTTTATCT ATTCCTGTGC TGGATGCIGT 3720 

25 GTAGCCACCT ATGTTTTAGG CATCTGTGAT CGACACAATG ACAATATAAT GCTTCGAAGC 3780 

ACGGGACACA TGTTTCACAT TGACTTTGGA AAGTTTTTGG GACATGCACA GATGTTIGGC 3840 

AGCTTCAAAA GGGATOGGGC TCCTTTTGTG CTGACCTCTG ATATGGCATA TGTCATTAAT 3900 

GGGGGTGAAA AGCCCACCAT TCGTTTTCAG TTGTTTGTGG ACCTCTGCTG TCAGGCCTAC 3960 

AACTTGATAA GAAAGCAGAC AAACCITTTI CTTAACCTCC TTTCACTGAT GATTCCTTCA 4020 

3U GGGTTACCAG AACTTAC!^ TATT(»AGAI TTGAAATAOQ ITA6AGATGC ACITCAACCC 4080. 

CAAACTACAG ACGCAGAAGC TACAATTTTC TTTACTAGGC TTATTGAATC AAGTTTGGGA 4140 

AGCATTGCCA CAAAGTTTAA CTTCTTCATT CACAACCTTG CTCAGCTTCG TTTTTCTGGT 4200 

CTTCCTTCTA ATGRTGAGCC CATCCTTTC3V TTTTCACCTA AftACATACTC CTTTAGACAA 4260 

GATGGTCGAA TCAAGGAAGT CnCTGITTTT ACATATCATA AGAAATACAA CCKAGATAAA 4320 

35 CftTTATATTT ATGTAGTCCG AATTTTGTGG GAAGGACMA TTGAACCATC ATTTGTCTTC 4380 

CGAACATTTG TOGAATTTCA GGAACTTCAC AATAAGCTCA GTATTATTTT TCtaCTTTGG 4440 

AAGTTACKAG GCTTTCCTAA TAGGATGGTT CTAGGAAGAA CACACATAAA AGRTGTAGCA 4500 

GCCSkAAAGGA AAATTGAGTT AAACAGTTAC TTACAGAGTT TGATGAATGC TTCAACGGAT 4560 

GTAGCRGAGT GTGATCTTGT TTGTACTTTC TTCCACCCTT TACTTCGTGA TGAGAAAGCT 4620 

40 GAAGGGATAG CTAGGTCTGC AGATGCAGGT TCCTTCAGTC CTACTCCAGG CCAAATAGGA 4680 

GGAGCTGTGA AATTATCCAT CTCTTACCGA AATGGTACTC TTTTCATCAI GGTGATGC31T 4740 

ATCAAAGATC TTGTTACTGA AQATGGAGCT GACCCAAATC CATATGTCAA AACATAOCTA 4800 

CTTCCAGATA ACCACAAAAC ATCCAAACGT AAAACCAAAA TTTCACGAAA AACGAGGAAT 4860 

CCGACATTCA ATGAAATGCT TGTATACAGT GGATATAGCA AAGAAACCCT AAGACAGCGA 4920 

45 GAACTTCAAC TAAGTCTACT CAGTGCAGAA TCTCTGCGGG AGAATTTTTT CTTGGGTGGA 4980 

GTAACCCTGC CTTTGAAAGA TTTCAACTTG AGCfiAAGAGA CGGTTAAATG GTATCAGCTG 5040 
ACTGCGGCAA CATACTTGTA A 

Seq ID NO: 30 Protein sequence 
50 Protein Aocession I!IP_002636.1 

1 11 21 31 41 51 

I I I I I I 

MRQIPSNSGP KECPPSHPEP TRAKDVDKEE ALQMEAEAIA KLQKDRQVTD NQRGFELSSS 60 

55 TRKKAQVYNK QDYDLMVPPE SDSQKRALDI DVBKLTQAEL BKLLLDDSPB TKKTPVLEVT 12 0 

PILSPSPSAQ LYPRPTIQRG QWPPGLPGPS TYALPSIYPS TYSKQAAPQN GPNPRMPTFP 180 

STEPIYLSLP GQSPYFSYPL TPATPFHPQG SLPIYRPWS TDMAKLFDKI ASTSEFLKNG 240 

KARTDLEITD SKVSNLQVSP KSEDISKFDH LDLDPLSKPK VDNVEVLDHB EEKNVSSLLA 300 

KDPWDAVLLE ERSTANCHLE RKVHGKSLSV ATVTRSQSLN IRTTQLAKAQ GHISQKDPNG 360 

60 TSSLPTGSSL LQEVEVQNEE MAAFCRSITK LKTKFPYTNH RTNPGYLLSP VTAQRNIOGE 420 

KASVKVSIDI EGFQLPVTPT CDVSSTVEII IMQALCWVHD DUIQVDVGSY VLKVCGQEEV 480 

LQtJMHCLGSH EHIQNCRKMD TEIRLQLLTF SAMCQHIiART ABDDBTPVDL NKHLYQIEKP 540 

CKBAMTRHPV EELLDSYHMQ VELALQIEHQ HRAVDOVIKA VKKICSALDG VETLAITESV 600 

KKLKRAVNLP ESKTADVTSL PGGEDTSESS TRGSLHPENP VQVSINQLTA AIYDLLRLHA 660 

65 NSGRSPTDCA QSSKSVKBAW TTTBQLQFTI FAAHGISSNW VSMYBKYYLI CSLSHNGKDL 720 

PKPIQSKKVG TYKNPPYLIK WDELIIFPIQ ISQLPLESVIi HLTLFGILHQ SSGSSPDSNK 780 

QRKGPEALGK VSLPLCDFEE PLTCGTKLLY LWTSSHTNSV PGTVTKEGYV MERIVLQVDP 840 

PSPAFDIIYT TPQVDRSIIQ QHNLBILBND IRGKLIMLH KDSSLGLSKB DKAFLWEKEY 900 

YCPKHPNCLP KILASAPNWK WGNLAKTYSL IBQWPAIjYPL IALELLDSKF ADQEVRSLAV 960 

70 TWIEAISDDE LTDLLPQFVQ ALKYEIYLNS SLVQPLLSEA LGNIQIAHNL YWLLKDALHD 1020 

VQFSTRYBHV LGALLSVGGK RLREELIiKQT KLVQLIiGGVA EKVRQASGSA RQWIiQRSME 1080 

RVQSPPQKNK CRLPLKPSLV AKELNIKSCS FPSSNAVPLK VTMVNADPLG EEINVMPKVG 1140 

EDLRQD^4LAL QMIKXMDKIW LKEGIXIt,RHV IFKCLSTGBB RGHVELVPAS DTLRKIQVEY 1200 

GVTGSFKDKP LAEWLRKYNP SEEEYEKRSE NFIYSCAGCC VftTYVLGICD RHNDUIMLRS 1260 

75 TGHMPHIDPG KPLGHAQMFG SFKEDRAPPV LTSDMAYVIN GGEKPTIRPQ LFVDLCCQAY 1320 

NLIRKQTNLP LNLLSLMIPS GLPELTSIQD LKYVRDALQP QTTDAEATIP PTRLIESSLG 1380 

SIATKFNFFI HNLAQLRPSG LESNDEPILS FSPKTYSFEQ DGRIKEVSVF TYHKKYBPDK 1440 

HYIYWRILW EGQIEPSFVF RTFVEFQELH NKLSIIFPLW KLP6FPNEMV LGRTHIKDVA 1500 

AKRKIELNSY LQSLMNASTD VAECDLVCTF PHPLLRDEKA EGIARSADAG SPSPTPGQIG 1560 

80 GAVKLSISYR N6TLFIMVMH IKDLVTEDGA DPNPYVKTYL LEDNHKTSKR KTKISRKTEtr 1620 

PTFNEMLVYS GYSKBTLRQE ELQLSVLSAE SLRENFFLGG VTLPLKDFNL SKETVKWYQL 1680 
TAATYL 
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T ACATTCTGGA ATTTGTAAGG G 
TTGGTATTTT CACTGTCAAT TATGCCTCGT ATTATTTATT TATTTGCCAA AATACGACTG 
TACCTCA TAGAGCTCRT GACACATAAT AGGTRTTCAC T 
r AAGCACTCAC ATCAATA 
AGATGAGCAC TGACTTTCCC CATTGAGGAG TCTCGATTAC CTCATGTCTC ACTTCAftACA 



TTCTTTTTCT TTTGTTTTTG TAGGTCCTAT AATACAGTAA AGACaTCAAR TAGACC 



1 11 21 31 41 51 

I I I I I.I 

CftGTCftGATT TTTTTTTTGC TTAACTAAGR CAftAGTGftAT ftATTCACTGT GAGCCAAATT 
CITTCTTGAT TCCTCTTTTT GGAGCAGTCC ATCTTTATGG GAAAACCAGC CTAGAATGGT 
GATTTCAGTT TCAGGTGATT TCGATAGAAT TGTATTTGGC TCAGAAATGA T 
GCCAAGAAAA ATTTTAAACT TTTTTTTTTG TAATCATATT ACTAGTTTGA T 
ACTTCCTTTG TTGACTTTCT ITGCCATTAA TTTAAAAGTT CCAGTA1 
TGTCTTATAT GTACAGAATC CTTTCCAGCT GTAAGTCATC AGCRAGT 
TGGCAATAGT TTTCATAAGA GGTTTTTTAA AACAGAAAAA TGTTGACATT GCCAGCETCT 
GGGTTGCATT TTGGGATATG CTACATTTCa AAGGTATCTT TTAAATCTGA AGGCAAAGAC 
TTTTTCAAGA TCTGAATATT CTGATTTACA GAAATTATAA AAAAAAAAGT CGACGCB 



I I I I I I 

TTTTAAGATG GAGITTTCGC TCTTGTTGCC CAGGCTGGAG TGCAGTGGTG CAATCTTGGC 
TCACTGCAAC CTCTGCCCCC CGGGTTCAGG CGATTCTCTC CTCTCAGTCT CTCAAGTAAC 

TGGGATTACA GGCACACACC ACCACGGCCA GCTAATTATT TTGTATTTTG AGTCGAGAGG 
r CAGGTAATCC GCCCGCCICG GCCTCCCAAA GTGTTGGGAT 
C GGTGCCCGGC CCGTTTTATT GTTTGAAAAA CAftGTACAGG 
iTAGAGT ATATACTGTA TTTGRAGTCT AGAACTGAGG 
CAGAGGCTGA TTAATATAAC TAGTTTACAT TTGTTAGCCT TTCACATCTG TGAAGGAATA 
AAGTACAGAC AAAAGTGQAA AACAAACCAG AAAAAAAAAA ATTGTGAAGC ACAGAGCTGC 
TTAAAAGAGT GGTGTCACAT TAAAAGAAAA AAGTCACAGA AATAAGTCAG TATTTTGTTT 
AGAGACTAGA ACTCCAACTG CTAGOCAACT GCCTAGAATA TAGTAAATAT TTTCTAGTTT 
CTTAAATGAC TAGTAATATT CCTACATTAT GTGATGGCAT TTCCCAAACT GTTTAATTAG 
ATGTTAGATT TGTAGCCAAA TATGTCTAGG AAATGCTTAA ACAATATAAA ACAGTTTTAA 
TGATTGGCTT TTTAGAACGT TATATATTAG TGTGCTTTAT GCATATCCAfl GAGGTGAGTG 
AGGTATTTGG GGTTTTTCAG ACTTACTTGA TTACAGATCT GGAGTATCTC AAAACAGTTG 
3 CAAACTCTGA GTCTTAGTCA TTAAAAATAG TTTTTGGGTA 



TCAGCGATTT TTATAGAAGT TGCTTTATGA CAAAGAAAGC TTTGGTTAAC T 
CATTTCACAC CCCTAAATTT TCTACATGAG GATTTATTTC TCTGGTTCTC TCACTTTCTC 
ACTCAGTTAT ACTGAATTCA TTTATGATGA GCGCTCTCAA CCATTCTTAT TCATCAAAGC 
TGAAGTTGGC AGAGCCCTCT CTGGTACCTG ATTAGAAGTC CGTCTTCCGT CTCATAGGGA 
AGTGTTAGAG ATGGATAATG TTTCTGTGTA GCAGAAGTAG TCATTATGTC CCCTTAAATT 

GAGAAGCTCA TGGTGGGGGA AACCTGGAAT TTATCTAAAT ATTTCATTTC TTTGATAAAT 
TACATTAAAA AATTAATAAG AGTATCTATI TG3TGAAATC ATTTTCCTCC ACGTGACCAA 



AGCAAACCCC TTTTTTAAGG CAATGTCAGT TATTAAGCTT TAGGGAACCA CATGCCACTT 1560 

TAGGTAACAC ATGATTGGAG AGATTGAASA GTGAAGTCCC TGCTTTAAAG TGTACTCCTG 1620 

TGGACACAGT AATGCATATA TTTAAAATGG TTCATGTTAA GAGTAGGTAT ATTTCTATCT 1680 

AAATACICTG TAGCTTTTGT GATTCAGGGA AATGAGTGGA GCCTCACAGG CACAAGAATC 1740 

TAGTAAATTC TAGGTTTCTT GTGTGGAACT CAOTGGGCSiA AATCTTAACT GAGTGAATTC 1800 

TTGATTATTG GTATCACATT TATTAGTCTG TATCTATCTG TGTCATCGAT CTCCTTAAGA 1860 

AGAGACTCGT AGATATTGAC TGGGAGACCC AAGCTGAATG CTAAAATCTG CTCCATGGAT 1920 

ATAAGCTGAT GCAGTCATCA TTTCACATTA AAATGTACCA CAGCTATATA TGCCGCftAAA 19B0 
AAAAAAAAAA AAAA 

Seq ID NO: 34 DNA sequence 

Nucleic Acid Accession #: CAT cluster 

1 11 21 31 41 51 

I I I I I I 

CTACTACTAA ATTCGCGGCC GCGTCGACTT tTTTTTTTTT TTGTCTTATG TCTCTAATCT 60 

GCACTGTTCA GCTCTTTTAG GCACTGCAAA GTTGTCTTGA ATTAGGAAAG AGGTGOTAGA 120 

ATGTGGGCGT GGGTGTTGAC CTACATCTGA ACAATTTACA TATGATTCAC CACAATTAAA 180 

CAATTTGGTT TGRAATAGCT ATAATTAAGT TATTATCAGA GAAGTATTTA CTAGTCTAGA 240 

AATTCTAAAT TTATCTTCXC ATACACCCTA ACTGAGAAAA GGGCCACATT TTCTGCACTC 300 

TATTAAGTAA AGCAAATGCT GAAOTAAATG CCTCCATGTT AACATTTATA TTGTTAAGTT 360 

ACTGACAGCa TATTCTATGA ATGATTACGT TAGTCGTTTC TTTAAAAATT ATAGGTTTGA 420 

AATAGCAAGA AAAATATGAA ATGATGGTAG ACAAAAAAGA GTTTCAGTIT CTAACTTCTA 480 

ACTATATATA TACACACACA CATGCACACA GAATTGCCTT CCCGGATGTA TAGAARTTAT 540 



195 



wo 02/098358 



PCT/LS02/17594 



ATACRGCCAT GTCCAGGCNC GATGGAAATT ATGGGGGAAT ATCCAANTTA GGATACNCGT 600 
GCCC3AAT0GC OSGGTNTAAA TAATAOJGGT TTATAATGGA dJATCCACAA TCCTGGTTTA 

Seq ID NO: 35 DNA sequence 
5 Nucleic Acid Accession #: NM_018490.1 
Coding sequences 445.. 33 00 



iU CCGCGGCTGG GAGACAGCGA GCCAGAGTCT GGGTGTT7GT GCGAGAGCCA CGGCGGGGGC 60 

TGGGGCGAGT GGCCGGCRTG GCTGAAGGCT GCGCTCTGCA ACCTTG.VlGA GCCGCTGCAT 12 0 

TGAGAGGCCA GGGACAGGGA GACCGGTGCG ATGGCAGAGC GCGGCCCCCG CCGCTGCGCC 180 

GGGCCGGCCC GGCTGGCCTG AGCCGCCGGA GGAGCGGGGC TGCCTCTGCG CGTCCATGGA 240 

GCAGCGGGAA GGGCGAAACT CCGGAGCGCC GCGTCCCTGC GCCGCTGCGG CGGACTGCTG 30 0 

15 AAGGGGCCGA GCCCGCGCGG ACCGCCGAGG AAGAGACCCC CGCTCCAGCC CGCAGGCCKG 360 

CTGCCCGGGG GCGGCGGGGG ACATCGGAGG GCAGCGGAGC GAGCAGCGCC GCGGGAGAGG 420 

CCGGCGCGGG AGGCGGCCGC AGCAATGCCG GGCCCGCTAG GGCTGCTCTG CTTCCTCGCC 480 

CTGtSGGCTGC TCGGCTCGGC CGGGCCCAGC GGCGCGGCGC CGCCTCTCTG OSCGGCGCCC 540 

TGCftGCTGCG ACGGCGACCG TCGGGTGGAC TGCTCCGGGA AGGGGCTGAC GGCCGTGCCC 600 

20 GAGGGGCTCA GCGCCTTCAC CCAAGCGCTG GATATCAGTA TGAACAACAT TACTCAGTTG 660 

CCASAAGATG CRTTTAAGAA CTTTCCTITT CTAGAAGAGC TACAATTGGC GGGCftACGAC 720 

CTTTCTTTTA TCCACCCAAA GGCCTTGTCT GGGTTGAAAG AACTCAARGT TCTAACGCTC 780 

CAGAATAATC AGTTGAAftAC AGTACCCftGT GAAGCCATTC GAGGGCTCAG TGCTTTGCAG 840 

TCTTTGCGIT TSGATGCKAA CCMATTACC TCAGTCCCCG AGGACAGTTT TGAAGGACTT 900 

25 GTTCRGTTAC GGCATCTGTG GCTGGATGAC AACAGCTTGA CGGRGGTGCC TGTGCACCCC 960 

CTCAGCftATC TGCCCACCCT ACAGGCGCTG ACCCTGGCTC TCAACAAGAT CTCAAGCATC 1020 

CCTGACTTTG CATTTACCAA CCTTTCAAGC CTGGTAGTTC TGCATCTTCA TAACSUITAAA X080 

ATTAGAGGCC TGAGTCAACA CTGTTTTGAT GGACTAGATA ACCTGGAGAC CTTAGACTTG 1140 

AGTTATAATA ACITGGGGGA ATTTCCTCAG GCTATTAAAG CCCGTCCTAG CCTTAAAGAG 1200 

J\) CTAGGATTTC ATAGTAATTC TATTTCTGTT ATCCCTGATG GAGCATTTGA TGGTAATCCA 1260 

CTCTTAAGAA CTATACRTTT GTATGATAAT CCTCTGTCTT TTGTGGGGAA CICAGCATCT 1320 

CACAATTTAT CTGATCTTCA TTCCCTAGTC ATTCGTGGTG CAAGCATGGT GCAGCAGTTC 1380 

CCCAATCTTA CAGGAACTGT CCACCTGGAA AGTCTGACTT TGACAGGTAC AAAGATAAGC 1440 

AGCATACCTA ATAATTTGTG TCAAGAACAA AAGATGCTTA GGACTTTGGA CITGTCTTAC 1500 

35 AATAATATAA GAGACCTTCC AAGTTTTAAT GGTTGCCATG CTCTGGAAG& AATTTCTTTA 1560 

CAGCGTAATC AAATCTACCA AATAflAGGAA GGCACCTTTC AAGGCCTGAT ATCTCTAAGG 1620 

ATTCTAGATC TGRGTAGAAA CCTGATACAT GAAATTCACA GTAGAGCTTT TGCCACACTT 1680 

GGGCCAATAA CTAACCTAGA TGTAAGTTTC AATGAATTAA CTTCCTTTCC TACGGAAGGC 1740 

CCGAATGGGC TAAATCAACT GAAACTTGTG GGCAACTTCA AGCTGAAAGA AGCCTTAGCA 1800 

40 GCARAAGACT TTGTTAACCT CAGGTCTTTA TCGGTACCAT ATGCTTATCA GTGCTGTGCA 1860 

TTTTGGGGTT GTGACTCTTA TGCAAATTTA AACACAGAAG ATAACAGCCT CCAGGACCAC 1920 

AGTGTGGCAC AGGAGAAAGG TACTGCTGRT GCAGCAAATG TCAGAAGCAC TCTTGAAAAT 1980 

GAAGAACATA GTCAAATAAT TATCCATTGT ACACCTTCAA CAGGTGCTTT TAAGCCCTGT 2040 

GAATATTTAC TGGGAAGCTG GATGATTCGT CTTACTGTGT GGTTCATTTT CTTGGTTGCA 2100 

45 TTATTTTTCA ACCTGCTTGT TATTTTAACA ACATTTGCAT CTTGTACATC ACTGCCTTCG 2160 

TCCAAATTGT TTATAGGCTT GATTTCTGTG TCTAACITAT TCATGGGAAT CTATACTGGC 2220 

ATCCTAACIT TTCTTGATGC TGTGTCCTGG GGCAGATTCG CTGAATTTGG CaTTTGGTGG 2280 

GAAACTGGCA GTGGCTGCAA AGTAGCTGGG TTTCTTGCAG TTTTCTCCTC AGAAAGTGCC 2340 

ATATTTTTAT TAATGCTAGC AACTGTCGRA AGAAGCITAT CTGCAAAAGA TATAATGAAA 2400 

50 AATGGGAAGA GCAATCATCT CAAACAGTTC CGGGTTGCTG CCCTTTCGGC TTTCCTAGGT 2460 

GCTACAGTAG CAGGCTGTTT TCCCCTTTTC CATAGAGGGG AATATTCTGC ATCACCCCTT 2520 

TTAAACTCAC TAGCATTTTT ATTAATGGCC GTTATCTACA CTAAGCTATA CIGCAACTTG 2 640 

GAAAAAGAGG ACCTCTCAGA AAACTCACAA TCTAGCATGA TTAAGCAIGT CX3CTTGGCTA 2700 

55 ATCTICACCA ATTGCATCTT TTTCTGCCCT GTGGCBTTTT TTTCATTTGC ACCATTGATC 2760 

ACTGCAATCT CTATCAGCCC CGAAATAATG AAGTCTGTTA CTCTGATATT TTTTCCATTG 2820 

CCTGCTTGCC TGftATCCAGT CCTGTATGTT TTCTTCSU^CC CAARGTTTAA AGAAGRCTGG 2880 

AAGTTACTGA AGOSACGTGT TACCAAGAAA AGTGGATCAG TTTCAGTTTC CATCAGTAGC 2940 

CAAGGTGGTT GTCTGGRACA GGATTTCTAC TACGACIGTG GCATGTACTC ACATTTGCAG 3000 

60 GGCAACCTGA CTGTTTGCGA CTGCTGCGftA TCGTTTCTTT TAACAAASCC AGTATCATGC 3060 

AAACACTTGA TAAAATCACA CAGCTGTCCT GCATTGGCAG TGGCTTCTTG CCAAAGACCT 3120 

GAGGGCTACT GGTCCGRCTG TGGCRCAC3«3 TCX3GCCCACT CTGATTATGC AGATGAAGAA 3180 

GATTCCTTTG TCTCAGRCAG TTCTGACCRG GTGCAGGCCT GTGGACGRGC CIGCTTCTAC 3240 

CAGAGTAGAG GATTCCCTTT GGTGCBCTAT GCTTACSATC TACCAAGAGT TAAAGACTGA 3300 

65 ACTACTGTGT GTGTAACCGT TTCCCCCGTC AACCAAAATC AGTGTTTATA GAGTGAACCC 3360 

TATTCTCATC TTTCATCTGG GAAGCACTTC TGTAATCACT GCCTGGTGTC ACTTAGAAfSA 3420 

AGGAGAGGTG GCAGTTTATT TCTCAAACCA GTCATTTTCA AAGAACAGGT GCCTAAATTA 3480 

TAAATTGGTG AAAAATGCAA TGTCCAAGCA ATGTATGATC TGTITGAAAC AAATATATGA 3540 

CTTGAAAAGG ATCTTAGGTG TAGTAGAGCA ATATAATGTT AGTTTTTTCT GRTCCRTAAG 3600 

70 AAGCAAATTT ATACCTATTT GTGTATTAAG CACAAGATAA AGAACAGCTG TTAATATTTT 3660 

CCTAATGTTT CATCCTTAAT CTCAGGACAA CTTACTGCAG GGCCAAAAAA G3GACIGTCC 3780 

CAGCTAGAAC TGTGAGAGTA TACATAGGCA TTACTTTATT ATGTTTTCAC TTGCCATCCT 3 840 

75 TTTTAAAACA ATATTAACAG CTGTTAGGTT AAAAAAATAG CTG3ACATTT GTT7TCAGTC 3 960 

ATTATACATT GCITTGGTCC AATCAGTAAT TTTTTCTTAA GTGTT7T3TG ATTACACTAC 4020 

TAGAAAAAAA GTAAAAGGCT AATTGCTGTG TGGGTTTAGT CGATTTGGCT .Vy^CTACTAA 4080 

CTAATGTGGG GGTTTAATAG TATCIGAGGG ATTTGGTGGC T7CATGTAA7 GTTCTCATTA 4140 

ATGAATACTT CCIAATATCG TTGGCTCTAC TAATATTTTC CAATTTGCTG GGATGTCACC 4200 

80 TAGCAATAGC TTGGATTATA TAGAAAGTAA ACTGTGGTCA ATACTTGCAT TTAATTAGAC 4260 

GAAACDGGGGA GTAATTATGA CAC3GAAGTAC TTATGTTTAT TTCTTAGTGA GCTGGATTAT 4320 

CTTGAACCTG TGCTATTAAA TGGAAATTTC CATACATCTT CCCCATACTA TTTTTTATAA 4380 

AAGAGCCTAT TCAATAGCTC AGAGGTTGAA CTCTGGTTAA ACAAGATAAT ATGTTATTAA 4440 
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TC-GGAAfiTGT 



GAAGGATTTA TTTACAGTGT GTTGTAATTT TGTAAGGCCA ACTATTTACA 
A TGTATATTTA CACATCTGAT AAATATTAAA TCATAACITG 
^TTTTTC CAAAATTCAG GTTATTGAAA ATTTTTCATT 
AAAAACTAGA ATAACAGATA TATAAAAGTG TTAATCTTTG TGCTATATGG 
AATATTGTftC TCftGTGITTT GAATTATTRA AGTTTCTAGA AftGCAAAAAA 



MP6EU3LLCF LALGLLGSAG FSGAAPFLCA AECSOXSDRR VDCSGKGLTA VPEGLSAFTQ 
ALDISMMNIT QLPEDAFKNP PPLEELQLAG NDLSPIHPKA LSGLKELKVL TLQtWQLKTV 
PSEAIR6LSA LQSLRLDANH ITSVPEDSFE GLVQLRELWL DHHSLTEVPV HPLSNLPTLQ 
ALTLALHKIS SIPOFAFTHL SSLWIJILHN NKIRGLSQHC FDGLDHLETL DLSYHIILGEF 
PQAIKARPSL KELGPHSNSI SVIPDGAPDG NPLLRTIHLY nnPLSFVGHS ASEHLSDLHS 

lvirgashvq qfpnltgtvh lesltltgtk issipnhlcq eqkmlrtldl smuppiiPS 

FNGCHALEEI SLQHNQIYQI KEGTFQGLIS LRILDLSEHL IHBIHSRAFA TLGPITNIiDV 
SPNELTSPPT EGPNGLHQLK LVGtlPKLKEA LAAKDFVNLR SLSVPYAYQC CAPHGCDSyA 
NLHTEimSLQ DHSVAQEKOT ADAANVTSTL EHEEHSQIII HCTPSTGAPK PCEYI.LGSWM 
IRLTVWFIFL VALFFNLLVI LTTFASCTSL PSSKLFIGLI SVSNLFMGIY TGILTFLDAV 
SWGRPAEPGI WHETGSGCK7 AGPLAVPSSE SAIPLLMLAT VERSLSAKQI MKNGKSNHLK 
QPRVAALSAP LGRTVAGCPP LPHRGBYSAS PLCIiPFPTGE TPSLGPTVTL VLUISLAPLL 
MAVIYTKLYC tHiEKEDLSKN SQSSMIKHVA WLIFTNCIFF CPVAFFSFAP IilTAISlSPE 
IHKSVTLIFP PLPACLMPVL YVPFHPKFKE DHKLLKRRVT KKSQSVSVSI SSQGGa.EQD 
PYYDCGMYSH LQGHLTVCBC CESFLLTKPV SCKHLIKSHS CTALAVASCQ RPEGYWSDCG 
TQSAHSDYAD EEDSFVSDSS DQVQACGRAC FYQSRGFPLV RYAYHLPRVK D 



ATGCTGCGAG CCGCAGTGAT CCTGCTGCTC ATCAGGACCT GGCTCX3CGGA GGGCAACTAC 
A TCCCGAAATT CCACTTCGAG TTCTCCTCTG CTGTOCCCGA AGTCGTCCTG 
;CTGTGG TTCAflAAGAT TTTGGACAGG 
r CCGCCTGAGA CCGAATTTTG GAGGTGCCCC TGTGCCTGTG 
A CAGATCTCAG AAATGAATAT GGACTACACG 



CTGAACTTGA CCCTGGACTA TCGGATGCAT GAGAAGTTGT GGGTCCCTGA CTGCTACTTT 
TTGAi^CAGCA AGGATGCTTT CGTGCATGAT GTGACTGTGG AGAATCGCGT GTTTCAGCTT 
CACCCAGATG GAACGGTGCG GTACGGCATC CGACTCACCA CTACAGCAGC TTGTTCCCTG 
GATCIGCATA AATTCCCTAT GGACAAGCAG GCCTGCAACC TGGTGGTAGA GAGCTATGGT 
TACACEGTTG AAGACATCAT ATTATTCTGG GATGACAATG GGAACGCCAT CCaCATGACT 
GAGGAGCTGC ATATCCCTCA GTTCaCTTTC CTOGGAftGGA CXSATTACTAG CARGGAGGTG 
TATTTCTACa CAGGTTCCTA CATAOSCCTG ATACTGAAGT TCCAGGTTCA GAGGGAAGTT 
AACAGCTACC TTGTGCAAGT CTACTGGCCT ACTGTCCTCA CCACTATTAC CTCTTGGATA 
TCGTTTTGGA TGAACTATGA TTCCTCTGCA GCCAGGGTGA CAATTGGCTT AACTTCAATG 
CTCATCCTGA CCaCCATCGA CTCACATCTG CGGGATAAGC TCCECAACaT TTCCrGTATC 
AAGGCCATTG ATATCTATAT CCTCGTGTGC TTGTTCTTTG TGTTCCTGTC CTTGCTGGAG 
TATGTCTACa TCMCTATCT TTTCTACAGT CGAGGACCTC GGOKCAGCC TAGGCGACAC 
AGGAGACCCC GAAGAGTCAT IGCCCGCTAC OSCTACCAGC AAGTGGTGGT AGGAAACGTG 
CAGGATGGCC TGATTAACX3T GGAAGACGGA GTCAGCTCTC TCCCCATCAC CCKAGCGCaG 
GCCCECCTGG CAAGCCCGGA AAGCCTCGGT TCTTTGAOST CCACCTCCGA GC3«3GCCCAG 
CTGGCCACCT OSGAAAGCCT CAGCCCACTC ACTTCTCTCT CAGGCCAGGC CCCCCTGGCC 
ACTGSAGAAA GCCTGAGCXSA TCTCCCCTCC ACCTCAGAGC AGGCCCGGCA CAGCTATGGT 
GTTCGCTTTA ATGGTTTCCA GGCTGATGAC AGTATTTTTC CTAOCGAAAT CCGCAACCGT 
SCCATGG TGTTACCCAT GACCATGAAG ATTCCAATGA GAGCTTGAGC 
C GCCATGGCCA TGGCCCCAGT GGGAAGCCCA TGCTTCACCA TGGCGAGAAG 
GGTGTGCAAG AAGCAGGCTG GGACCTTGAT GACAACMTG ACAAGAGCX3A CTGCCTTGCC 
ATTAAGGAGC AATTCAAGTG TGATACTAAC AGTACCTGGG GCCTTAATGA TGATGAGCTC 
ATGGCCCATG GCCAAGAGAA GGACAGTAGC TCAGAGTCTG AGGATAGTTG CCCCCCAAGC 
CCTGGGTGCT CCTTCACTGA AGGGTTCTCC TTCGATCTCT TTAATCCTGA CTACGTCCCA 
AAGGICGACA AGTGGTCCOS GTTCCTCTTC CCTCTGGCCT TTGGGTTGIT CAACATTGTT 
TACTGGGTAT ACCATATGTA TTAG 

Seq ID NO: 3 8 Protein sequence 



MLRAAVILIiL IRTWLAEGNY PSPIPKFHFE FSSAVPEWL NLPNCKHCM EAWQKILDR 
VLSRYDVRIiR PNFGGAPVPV RISIYVTSIE QISEMNMDYT ITMFFHQTMK DSRLAYYETT 
D VTVEHRVFQL H 
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DLHKFPMDKQ ACNLWESYG YTVEDIILPW DDNGNAIHMT EEmiPQFTF LGRTITSKEV 
YFYTGSYIRL ILKFQVQREV NSYLVQVYWP TVLTTITSWI SFWMHYDSSA ARVTIGLTSM 
LILTTIDSHL RDKLPNISCI KAIDIYILVC LPFVFLSLLE YVYINYLPYS RGPEEQPRRH 
RRPRRVIARY RYQQWVGNV QDGLINVEDG VSSLPITPAQ APLASPESLG SLTSTSEQAQ 
LATSESLSPIi TSLSGQAPLA TGESLSDLPS TSEQARHSYG VRFNGFQADD SIFPTEIRNR 
VEAHGHGVTH DHEDSNESLS SDERHGHGPS GKPMLHHGEK GVQEAGWDLD DHNDKSDCLA 
IKEQFKCDTN STWGLNDDEL MMGQEKDSS SESEDSCPPS PGCSFTEGFS F 




CAAAAATTGT GCAAATGAAG C 

CGATGTCCGC CTGAGACCGA ATTTTGGANN NATGCTTGCT ACTAACAGTA C 
TAATGAAGAT GAGCICATGG CCCATGGCCA AGAGAAGGAC AGTAGCTCAG AGTCTGAGGA 
TAGTTGCCCC CCAAGCCCTG GGTGCTCCTT CACTGAAGGG TTCTCCTTCX3 ATCTCCTTAA 
TCCTGACTAC GTCCCAAAGG TCGACaAGTG GTCCCGGTTC CTCTTCCCTC TGGCCTTTGG 
GTTGTTCaAC ATTBIAGCGG CCGAACGATG C 

Seq ID NO: 40 Protein sequence 



Coding sequence: 



C CGCGCCGCCG CCGCCACCGC CCGCACTCCG CCSCCTCTGC 
CCGCAACCGC TGAGCCATCC ATGGGGGTCG CGGGCCGCAA C 
CGGTGCTGCT GCTGCTGCTG CTGCTGCCGC CACTGCTGCT G 
40 CGGGTCGGGG CCGTGCCGCG GGGCCGCAGG AGGATGTAGA I 
ATGACTGCCA TGCCGACGCC CTGTGTCAGA A 
AGCCTGGCTA CCAAGGGGAA GGCAGGCAGT GTGAGGACAT O 

TCAATGGAGG CTGTGTCCAT GACTGTTTGA ATATTCCAGG CAATTATCGT TGCACTTGTT 
TTGATGGCTT CATGTTGGCT CATGACGGTC ATAATTGTCT T' 
45 AGAACAATGG CGGCTGCCAG CATACCTGTG TCAACGTCAT B 
GCAAGGAGGG GTTTTTCCTG A 
GCCTGAGCTG CATGAATAAG GATCACGGCT G 
GCAGCGTCGC CTGTGAGTGC A 
TCTTGACCTG TAACCATGGG A 
50 GCCCAGAGTG CAGCTGCCRT CCACAGTACA AGATGCACAC A 

AGCGAGAGGA CACTGTCCTG GAGGTGACAG AGAGCAACAC CACATCAGT3 GTGGATG3GG 
ATAAACGGGT GAAACGGCGG CTGCTCATGG AAACGTGTGC TGTCAACAAT GGAGGCTGTG 
ACCGCACCTG TAAGGATACT TCGACAGGTG TCCACTGCAG TTGTCCTGTT GGATTCACTC 
^GATA TTGATGAGTG CCAGACCCGC AATGGAGGTT 
r CTGCAAAAAC ATCGTGGGCA GTTTTGACTG OSGCTGCAAG AAAGGATTTA 
C AGATGAGAAG TCTTGCCAAG ATGTGGATGA GTGCTCTTTG GATSGGACCT 
GTGACCACAG CTGCATCAAC CACCCTGGCa CATTTGCTTG TGCTTGCAAC CGAGGGTACA : 
CCCTGTATGG CTTCACCCAC TGTGGAGACa CCAATGAGTG CAGCATCAAC AAOSGAGGCT 
GTCAGCAGGT CTGTGTGAAC ACAGTGGGCR GCTATGAATG CCAGTGCCAC CCTGGGTACA 
AGCTCCACTG GAATMAAAA GACTGTGTGG AASIGAAGGG GCTOTGCCC ACAAGTGTGT 
CACCCCS3TGT gtccx:tgc3«: TGCGGTAAGA GTGGTGGSGG AGAOSGGTGC TTCCTCAGAT 
GTCACTCTGG CATTCACCIC TCTTCAGATG TCACCACCAT CAGGACAAGT GTAACCTTTA ; 
AGCTAAATGA AGGCaAGTGT AGTTTGAAAA ATGCTGAGCT GTTTCCCGAG GGTCTGCGAC : 
CAGCACTACC AGAGfiAGCAC AGCTCAGTAA AftGAGAGCTT CCGCTAOGTA AACCTTACAT 
GC3«3CTCTGG CAAGCAAGTC CCAGGAGCCC CTGGCCGACC AAGCRCCOCT AAGGAAATGT 
TTATCACTGT TGAGTTTGAG CTTGAAACTA ACCRRAAGGA GGTGACAGCT TCTTGTGACC 
TGAGCTGCAT CGTAAAGOSA ACCGAGAAGC GGCTCCGIAA AGCCATCOSC ACGCTCAGAA I 
AGGCCGTCCA CAGGGAGCftG TITCACCTCC AGCTCTCAGG CATGAACCTC GACGTGGCTA 
AAAAGCCTCC CAGAACATCT GAACGCCAGG CAGAGTCCTG TGGAGTGGGC CAGGGTCRTG 
CAGAAAACCA ATGTGTCAGT TGCAGGGCTG GGACCTATTA TGATGGAGCA CGAGAACGCT 2040 
GCATTTTATG TCCAAATGGA ACCTTCCAfiA ATGAGGAAGG ACAAATGACT TGTGAACCAT 2100 
GCCCAAGACC AGGAAATTCT GGGGCCCTGA AGACCCCAGA AGCTTGGAAT ATGTCTGAAT 2 ISO 
GIGGAGGTCT GTGTCAACCT GGTGAATATT CTGCAGATGG CTTTGCACCT TGCCAGCTCT 2220 
GTGCCCTGGG CACSTTCCAG CCTGAAGCTG GTCHAACTTC CTGCTTCCCE TGTGGAGGAG 22 80 
GCCTTGCCAC CAAACATCAG GGAGCTACTT CCITTCAGGA CTGTGAAACC AGAGTTCAAT 2340 
GTTCACCTGG ACAITTCTAC AACACCACCA CTCACCGATG TATTCGTTGC CCAGTGGGAA 2400 
CATACCAGCC TGAATTTGGA AAAAATAATT GTGTTTCTTG CCCAGGAAAT ACTACGACTG 2 4 SO 
ACTTTGATGG CTCCACAAAC ATAACCCAGT GTAAAAACAG AAGATGTGGA G 
GAGATTTCAC TGGGTACATT GAATCCCCAA ACTACCCAGG CAATTACCCA G 
AGTGTACGTG GACCATCAAC CCACCCCCCA AGCGCCGCAT CCTGATCGTG G 

TCTTCCTGCC CATAGAGGAC GACTGTGGGG ACTATCTGGT GATGCGGAAA ACCTCTTCAT 2700 
CCAATTCTGT GACAACATAT GAAACCTGCC ASACCTACGA ACGCCCCATC GCCTTCACCT 2760 
CCAGGTCAAA GAAGCTGTGG ATTCAGTTCA AGTCCAATGA AGGGAACAGC GCTAGAGGGT 2820 
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GAGATGGCAG GCTCTATGCA TCTGAGAACC ATCAGGAAAT ACTTAAGGAT AAGAAACTTA : 
TCAAGGCTCT GTTTGATGTC CTGGCCCATC CCCAGAACTA TTTCAAGTAC ACAGCCCAGG 

AGTCCCGAGA GATGTTTCCA AGATCGTTCA TCCGATTGCT ACGTTCCAAA GTGTCCAGGT : 

GGTTGGTGGG ACAGAGCTGT CTTCCTTCTG CATGTCAGCA CAGTCGGGTA TTGCTGCCTC : 
r GACTCATTAG AGTTCAATTT TTATAG? 
r TTTCTTTCCC AGCATCGTGG ATGTAGACTG A 

CAGCTTCTCA CTGCTGTGGG CGGATGTCTT GGATAGATCA CGGGCTGGCT GAGCTGGACT 33 60 

TTGGTCAGCC TAGGTGAGAC TCACCTGTCC TTCTGGGGTC TTACTCCTCC TCAAGGAGTC 3420 

TGTAGTGGAA AGGAGGCCAC AGAATAAGCT GCTTATTCTG AAACTTCAGC TTCCTCTAGC 34B0 

CCGGCCCTCT CTAAGGGAGC CCTCTGCACT CGTGTGCAGG CTCTGACCAG GCAGAACAGG 3S40 

CAAGAGGGGA GGGAAGGAGA CCCCTGCAGG CTCCCTCCAC CCaCCTTGAG ACCTGGGAGG 3600 

ACTCAGTTTC TCC3VCAGCCT TCTCCAGCCT GTGTGATACA AGTTTGATCE CAGGAACTTG 36S0 

AGTTCTAAGC AGTGCTOSTG MAARAAABA GCAGAAAGAA TTAGRAATAA ATMAAACTA 3720 
ASCACITCTG GAGACAT 



I I I I I I 

MGVAGRNEPG AAWAVLLIiL LLPPLLIiLAG AVPPGRGRAA GPQBDVDBCA QGLDDCHADA 
LOQNTPTSYK CSCKPGYQGE GRQCEDIDEC GNELNGGCVH DCMIPGNYE CTCFDGFMIiA 
25 HDGHHCLDVD ECIiEMIIGGCQ HTCVNVMGSY ECCCKEGFFL SDHQHTCIHR SEEGLSCMNK 
DHGCSHICKE APRGSVACEC RPGFELAKNQ RDCII;TC3)EG NGGCQHSCDD TADGPECSCH 

PQYKMEUDGR scleredtvl evtesnttsv vdgdkrvker llmetcavw ggojrtckdt 

STGVHCSCPV GFTLQLDGKT CKDIDECQTR NGGCDHPCKN IVGSEDCGCK KGFKLLTDEK 
SCDDVDBCSL DRTCDHSCIN HPGTFACACN RGYTLYGFTH CGDTNECSIN HGGCQQVCVN 
30 TVGSYECQCH PGYKLHWNKK DCVEVKGLLP TSVSPHVSLH CGKSGGGDGC FLRCHSGIHL 
SSDVTTIETS VTPKLNBGKC SLKKAELFPE GLRPALPEKH SSVKBSFRYV HLTCSSGKQV 
PGAPGRPSTP KEMFITVEFB LETHQKEVTA SCDLSCIVKR TEKRLRKAIR TLRKAVHREQ 
FHLQLSGMML DVAKKPPRTS ERQAESCGVG QGHAEHQCVS CRAGTYYDGA HERCILCBHG 
TFQNEEGQMT CEPCPRPGNS GALKTPEAWN MSBCGGLCQP GEYSADGFAP CQLCALGTFQ 
35 PEAGRTSCFP CGGGLATKHQ GATSFQDCET RVQCSPGHFY NTTTHRCIRC PVGTYQPBFG 
KNNCVSCPGN TTTDFDGSTN ITQCKNRRCG GELGDFTGYI ESENYPGNYP ANTECTWTIN 
PPPKRRILIV VPEIFLPIED DCGDYLVMRK TSSSNSVTTY ETCQTYERPI AFTSRSKKLW 
S ARGFQVPYVT YDEDYQBLIE DIVEDGRLYA SENHQEILKD KKLIKftLEDV 



TTTCTTCATT TTATGCTTTT 
: ATTAATTCTC 
4 CTACTACCTG 
CTCTCCTTTC TACTGGCATC 

3 CTCCTTGCCA 



AAACTTTAAA ATACAAAGCC 




TCACTTTCCA A 

TTAAAATTGT ATAAAAGAGA AGAAATTTAA GAIATTGAAA A 
AAATAAAGCT GGTITTGGAA GAGCAGTGGC CACTGTGATT GACAATGGGG GCACTTACTG 
TTAAGGGGAT TTATAACRGA AGTACTTGAA CAGAATTGTG AAGAGAATAG AATTGTGCAT 
TCTTTTATCT GCCCAGAACC ACAGCTCCCa TGGGAAATAC TCCAOCTCAT TCTACAACCT 
TCTGGCTGCA ACAAAAGCAG TCAAATTAAA ACATAADCCA AAGOSGGTAC CTAACCCAAC 
TTGAGAAAAT CATAGCATNC TCCCTTTGGC TATAACTNTT TCCACaTGAA ATACATTCAA 
ATGCCTT 



! CAT Cluster 



I I I I I I 

TTTTTTTTTT TTTTTTTGGA TTTTAGTATG CCTTGCAATT TTTTCCCTTT ATTCTGATGC 
ATGAAGTACC CACTAAAAGT GACTGCTGTT AGTATAGCTT CAGTAATGAG GTGATGAGGT 
GACAGGGCAG GTGATGCTCT CTTAGTCTCT TTAGGCTACT ATTACAAAAT ACTTCAGACT 

GAGTAATTCA TAAACAACAG AGATTATTGT TCACAGATCT GGAGGCTGGA AAGTACAflGA 
CTAAAGGGCC AGAATATTTG GTGTTTGGTG AAGGTCAAAC ATTCftGACAC TCTCAACGAC 
TATAGCGACA GCAGCAGTCT TCAGGAATCC TATGTGAGGG ACAAACACTC AGAAGCCAGC 
TGGAGTGTTC TAGAATCCTA TGTGAGGGAC AAACATTCAG ACCCCAGCAG TAGTGTTGTG 
GAATCCTATG TGAGGGACAA ACTTTCAAAC CCTTGTAGCA GTCTTCTGGA ATCCTATGTG 
AGGGACAAAA ATTCAGAACC TTGTAGCAGT GTTCTGGAAT CCTATGTGAG G 
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G CGCGCCTTCC T 

GACGACCGGC GGCGCGGCTG CGTGCACCTG CGCACCTCCT TCTTGAAGOS CCGACGGCGG 
GCCCCCCGQG ACCCCACGCG CGCCCCGGCC CGGCCCGGGG ATCAGCCGCC GCCGCCGCCG 
CAAGCGCCTT GGTGTTCGCT CCGGCCGACG AGCCGCGGAC GGTCCTGGAG AGGAAGCCCC 
TGCCCCTGGG CGTGCGCGCC CCTCTGGCCG GTCCCAGCGC C 
AGCTGTGCGC CCCGGCTGAG GCGGCGCCCT G 
CGGCGCTGGA ACCGAGCTCC AGCGCGGACG CAGGGACGGG ACCGGGGAGC GGCTCTTCGT 



GCGGTGCAGT GGCGTGCAGG G 
TGCCCGAGGG GAACGTCGGA GGCACACCAT CGCCAGCGGC GTG3ACTGCG GCCTGCTGAA 
GCAGftTGAAG GAGCTGGAGC AGGAGAAGGA GGTGCTGCTG CAGGGTTTGG AGATGATGGC 
GCGGGGCCGC GACTGGTACC AGCAGCAGCT GCAACGAGTG CAGGAGCGCC AGCGCCGCCT 
GGGCCAGAGC AGAGCCAGCG CCGACTTTGG GGCTGCAGGG AGCCCCCGCC CACTGGGGCG 



CTCACCCCCG GTCTGGCAGC AGCAGACCAT CCTCATGCTG AAGGAGCAGA ACCGACTCCT 
CACCCAGGAG GTGACCGAGA AGAGTGAGO; CATCACX3CAG CTGGAGCAGG AGAAGTCGGC 
GCTCaTTAAG CftGCTGTTTG AC3GCCCGCGC CCTGAGCCAG CflGGRCGGGG GaCCTCIGGA 
TTCCaCCTTC ATCTAGTCCT TGTGGGCCGC GTGGGCCCCC AGGGCCAGCC TGGCACTCAG 
CCCTTCGAGG GTGGGCGCCC CRTCGCACCC flCCCTCICTG GCTGGRGftCC COCGGC2M3GC 
CCAGGCACAG TOCCGGAGTG GGOKXITTCC TGCCGCCCTT GCCAGATGGG CTCCCCflGGC 
CTGCCCCCGG CTGGTCCCCG CACCGftGCGC TTGACTCCGT TTTGGCTCCT GGTTGYTGAC 
ATGGGCTGGG GGCTCTCTTG AGTCCX3CATA GTCCGCAGCT ACTACTGGCC GCIGTCAGTG 
GACAGTGGGG TACCCCTCCA TGAGTTAGCG TCCCCCCX3TT TCCAGCGGTG CCGCXICIGGG 
TCCCATCTTC AGGGAAAGGC ACTGCCCAOG Ct3«3GCrGCA CTTCCAACaA CGGGCAGCAG 
AGGGCGCGGG GCGGCTCCGA CXK3GGGTCCR AGGGCAGCTT CCCGCTCAAC CAGGGCftCCA 
GGACGAGGTG GCTGTAGCTC GGACGGACGG AAGTAGATGG AGGQGGTGGG GACGGCCTGT 
AAGCGGGGGG TGCCTGCCTG GCTGGGGAGC CCCAGGGATA GCGCTCGGAC TTCAGGTTCT 
GGCCSAGGCT GRGGGACCCT GGCTGCAGCG GATCGGCAOS CCGGGTGGGC GAGAGCITGG 
CCTGCATGTG CCTCCCaCAG ACCCIGGGGT GATGGCXTTC CCCCTCTTGG CCGGGftQSTT 
GCCCCACGTT GAGTCCCACA CAACATCCTG TGAGCCTGGC TCCCCAGGAG GGCCCCCAGA 
A GGCAAAGCCT GTTTCCCCCG ACTCAGGATT TCCAAGGCCT 
T TCaOCTGGGA 
G GAGGCTGGGG 

CAGGTCCCCT TGGGTGTCAC TCCCTCAGCC CCTGCCCAGG CCCACTCCCG CTGGTGCTGG 



ATCGATGGGT TCTCa\GCCC R 



AGTGTGTGTG GGGCGCAGGG CCTCCGATGC GGGGTCAGTG CGTGGGGGGC GCAGGGCCCC 
CGATGCGGGG TCAGTGCGTG GGGGGCGCAG GGCCCCCTCG TGTCCAGGGC ftCTTTGGTAC 
ACTGTCCCAC AAGGCACCTG TCTCAGAGGA CCGGCCCTGG CAGGCAGCGT GGCAACTCCT 
TCCGGAGCCC AGCTCCATGC TAACCTGCCC ACAGCAACCC CACAGAGCCA CATTCCCTGC 
TGCACCTGGT CTGCAGGGTG TCCCAOGACA GGCCCAAGTC RGCCCRGCAT GCAGCTGCCC 
TCCTACCCTG AAGATGGGAG TGGGCTTTCC AGGGGACATA AGGATGTCAG 
CCTGGGCAGG AAAGGGTGCA GGTCCTGAGG GCCTGTGCCC CACAGCCCCA 
GGACTGCAGC GCRGTGGGTG GGCCRGTGGC AGCCAGGGAG AAGCCCCCCG TCAGCAGGCT 
GGGGTCTGCC CACCAGGGCC TCCCCACGTC TGCCTTTGAG GGTGCCTG 
GGATCCTGGC ATCTTTACTG GACTGGAAGC AGGAGACAGA ACAGTG7C 
ACTTCATCAG GAGACCGCCC ACATAGAGCT GGACCCCGCA GCTGAAGCGG A 
CAGGCTGGCA CCTCCGGAAA RACTGCCTTT CAGCCTTGGT GTTCCGTGCA AGGTGAAAAG 
laVGTTTA CAGCTTGAAA TCAGGCTAGT GAGTGGCCCT G 
I TTAAAGGCCC CGGCTGGCAG GGTCTAGGTG GCTGGCAGAG G 
ACCCIGCCTG GAGCCTGCCC TAGGftCGCTG GGCGGGTCAG TCTCCGTGCA GGATGTGAGC 3120 
AGOSTCCCTG GGCTCTATCC GCGAGGTGCC AGTAGCGTGT GCAGGTACAT ACACGTGCGT 3180 
GCACaCTGTG ATGACACCCG GAAATGTCTC AGGATGTTGA AATGTGTCCT TGGGGGCAGA 3240 
AGTGTCCCCA GTTGAGAATC TGCCXKAGAG GAACACACCC ACACCAGGCC TCAGGATTTT 3300 
GTGTTGATCA AGTTCCiAAGG AAAAGGAACA TCTCAGCCGG GCGTGGTGGT TCACGCCTGG 3 3 SO 
AATCCCAGCA CTTGAGGCCA GGAGTTCCAG AGCaGCCTGG GCAAOGCAGT GAGAGAOCCC 3420 
ATCTCTACM RAARAAAflAA AGAAAGAAAG AAAATGAGAG ATCCAGGTTT AAAAATTCAT 3480 
AAACACCACA AGGAAACAAT ACACTATGAG ACCCAGCAGA AGCAACAGAT TGACTCTAGA 3540 
CCCAGATACT AGAATTATCA GAGAGAATAT AAAGTAACAG TGTTTTATAT ATCTAAAGAA 3600 
ATAAAAGAGA TTTCTGGAAA CATGAAAAAA AA 

Seg ID NO I 46 Protein sequence 



I I I I I I 

-JARASNS CLMSflDPSCS SCVMRSLFSV TSWVRSRFCS FSMRMVC3C0Q 

3 QGGPEEDGGR ARLAQAAASS SPEHRATSCT LGSSRPSGRG LPAAPKSALA 

S CTRCSCCWYQ SRPRAIISKP CSSTSFSCSS SFICFSRPQS TPIiSMVOiRR 

SPRARGARRA SPESAPGPCT PLHRDKHEAL SLQTRRGALQ DPESTKSRSP VPSLRPRWSS 

VPAPRSGTAR APRGRAPPQP GRTAAPGOGR RRWDRPBGRA RPGAGASSPG PSAARRPBRT 

PRRLRRRRRL IPGPGRGARG VPGGPPSALQ BGGAQVHAAA P 



CAAAGCTCTA AGTATGCTGG GACftGATACT ACAAATGAAC TTTATGATGA G 
CTGATTTATA GTCCTGTACT TTCTCTAOGT GCCATATCCA TTATTAAAGA A 
AGTAGGAAGT AGAGTTAACC TATAGTTTCA TTTCTTGAAT TTCTTATTCT CTTTCTTCAG 
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TCTTTTTCAG TTAACCTACA CACACACACA CaCACftCACA CACACACACA CACATATGTT 
3 ATGGGAGAAC GGGTAOSGTG ATAATTAAAA GAGGTftAGGT TTCTCTTGAG 
I TCTAAAATTG TGATGGCGGA TGCACACCTC TGAATATATT AAAAGCCATT 
GAAATGAAAA AAGGGTGGGG GGAATCCAAA AGTGTAGCAG ACCCAACCTT GACiATTTGCT 
TGTTTGGGAA TGAATTTTCC AATAACTTGA AAGTTGTAAA AACTCACACT TCTCAGGGTT 
AGGTGTCAGA AAGAAAAGGA AGTAATTTAT TCTTTAATAA AGCMTTCTT AAATACTCTT 
TAGAACTACC ACTGATTGCA ATTTTGCAGT GTCTACTCAT AGTGTCTATA TAGGTACCAT 
A ACTGTTCTCA TGTTACTTCA GAAAAATTTT GCTTCTAAOT 

\ AATGACAGTC TTCTCTTTGT ACTCCTCCCT GTACAACATC 
ACAGAGCTCC ATCTGTATAC ACGAAAGTCA CATGAAAATA GAACTCAGTG TTTTGTATTA 
CATAGTCTAT TCAGTACATT TAGAAGTATT TTGCCTCCAA TATTCAACCA CAGTAAAAGA 
CTCAGTGAGA ACGCGTGGTG GCGCTGCAGG TTAAGATGAC 
CftGCTCa«3G ATCCAGCAAA CCGTTTCCCA AAGCCTGGAA G 
GAGCGAACGT GAGTGTGAAA CCTCTTTAAG ACACCGTTGG 
TGGftCTGCAA AACAGTTCTA CTAGGATCCT GGGGATACAT GAAGCTTCTG T 

TTTCftAGAAA AAGCAATGGA GATTGGATGG ATGCACAATC GGAGACAAAG GCAAGTCCTT 1200 

GTTTTCTTTG TTTTGCTGAG CTTGTCTGGG GCGGGCGCCG AGTTGGGGTC CTATTCCGTA 1260 

GTGGAAGAAA CGGAGAGAGG CTCTTTTGTG GCAAATCTAG GAAAAGACCT GGGGTIGGGG 1320 

TTGACftGftOA TGTCCACCCG CAAGGCCAGG ATCATTTCCC AGGGGAACAA ACAGCAT7TG 1380 

CAGCTCftAOG CTCAAACTGG GGATTTGCTC ATAAATGAGA AGCTAGATCG AGAGGAGCTA 1440 

TGOKTCCCA CTGAGCCTTG CATACTACAT TTCCAAGTGT TAATGGAAAA CCCTTTAGAA 15Q0 

ATATTTCftOG CTGAACTGAG GGTOATAGAT ATAAATGACC ATTCTCCCAT GTTCACTGAA 1560 

AAGGAAATGft TTCTAAAAAT ACCGGAAAAC AGTCCTCTAG GAACTGAGTT CCCTCTGAAT 1E20 

CATGCTTTGO, ACTTGGACGT AGGRRGCAAT AATGTTCAAA ACTATAAAAT CAGCCCAAGC lEBO 

TCTCRTTTCC GGGTTCTAAT CCATGAATTC AGAGATGGCA GGAAATACCC TGAGCTAGTG 1740 

TTGGRTAAAG AGCTGGRTCG GGAGGRGGAG CCTCAACTAA GATTAACCCT GRCAGCGCTG 1800 

GATGGTGGCT CTCCACCGCG ATCTGGAACT GCTCftGGTCC GTATTGAAGT GGTGGRCATC 1860 

ftATGRTAACG CTCCTGAGTT TGAGCAGCCC ATCTACAAAG TGCRG&TTCC AGftGAACAGT 1920 

CTTCTTGGCr CCCTGGTTGC CACCGTCTCC GCCftGGGATT TAGftCGGOSG AGCCAATGGft 1980 

AAAATATCAT ACaCACTCTT TCAGCXTTTCX! GAGGATATTA GTAAAACTTT GGAGGTAAAT 2040 

CCTATGACAG GGGAAGTTCG ACTGAGAAAG C3UIGTAG&TT TCGAAATGGT TACGTCTTAT 2100 

GftflGTGOSCA TCAAAGCCAC AGATGGGGGA GGTCTTTCftG GAAAGTGCAC TCTTCTCCTG 2160 

CAGGTGGTGG ACGTGAATGA CAATCXXKCA CAGGTGACCA TGTCTGCaCT CftCXaGCCCC 2220 

ATCCCAGAGA ACTCGCCTGA GATAGTAGTT GCTCTTTTCA GOGTTTCAGA TCCTGACTCC 2280 

GGAAACAATG GGftAGACGAT TTCCTCCATC CaGGAAGACC TTCCCTTTCT TCTAAAACCT 2340 

TCAGICAAGA ACTTTTACAC CTTGGTAACG GAGAGAGCAC TCGftCAGRGR AGCAAGAGCT 2400 

GAATATAATA TCACCCTCAC CGTCACAGAT ATGGGGACTC CAAGGCTGRA AACGGAGCAC 2460 

AACATAACAG TGCAGATATC AGATGTCAAT GATAACGCCC CCACTTTCAC CCAAACCTCC 2520 

TACACCCTGT TCGTCCGCGA GAACAACAGC CCCGCCCTGC ACATOGGCAG CGTCAGOSCC 2580 

ACAGACAGAG ACTCAGGCAO CAACGCCCAG GTCACCTACT CGCTGCTGCC GCCCCAGGAC 2640 

CCGCACCTGC CCCTCGCCTC CCTGGTCTCC ATCAACGCAG ACAACGGCCA CCTGTTCGCC 2700 

: TGGACTACGA GGCCCTGCGG GAGTTCGAGT TCCGCGTGAG CGCCACAGAC 2760 

C CGGCTTTGAG CAGCGAGCCG CTGGTGCGCC TGCTGGTGCT GGACGCCAAC 2B2 0 

GACAACTCGC COTTCGTGCT GTACCCGCTG CAGAACGGCT CCGCGCCKTG CACTGAGCTG 2 880 

GTGCCCCGGG CGGCCGRGCC GGGCTACCTG GTGACCAAGG TGGTGGCGGT GGACGGCGAC 2 940 

TCGGGCCAGA ATGCCTGGCT GTCGTACCAG CTGCTCAAGG CCACGGAGCC OSGGCrGTTC 3 000 

3 CGAGGTGCGC ACCGCCAGGC TGCTGAGCGA GCGCGACGCA 3060 

3 AGCCTCCGOS CTCGGCCACC 312 0 

GCCACGCTGC ACGTGCTCCT GGTGGACGGC TTCTCCCAGC CCTTCCTGCC GCTCCCAGAG 3180 

GCGGCCCCCG GCCAGACCCA GGCCAACTCG CTCACIGTCT ACCTGGTGGT GGCGTTGGCC 3240 

[■TTCGGTG CICCTGTTCG TGGCGGTGCG GCTGTGCAGG 3300 
3CCGCTGC TCGATGCCTG AGGGCCCCTT TCCAGGGCGT 
CTGGTGGACG TAAGCGGCAC CGGGACCCTG TCCCAGAGCT ACCAATACGA GGTGTGTCTG 
ACAGGAGGCT CAGAAACAAG TGAGTTCAAG TTCCTGAAGC OGATTATCCC CAACTTCTCT 
CCTTAGGGCA CTAGGAAAGA AATAGATTAA AATTCCACCC T 
AATTATTGAT AGGAACCCAT TTGATAAATT CCTTAACTTC TTATGATTGT C 

AAATTGTTCA TGCTCACCAC CACCAATAAG GTATTTTTCT CTGATTGTTA GTTCAAATTA 3660 

TATTGTTAAT TCCAGTTTCC CTTTTCCTCA TATTTACCCC GAAGAGGTGT TGCATATAGA 372 0 

ATCCCAATTA hCPMAIhlk CTTTATCTTC AAAGTTGATG TCATTTAAAA TTTTTCCGTC 3780 

TTTATATTTT ATTTACITCC TATTCATTTT TTGCTCCATT TTTCATGTTA CTTCTCAGTT 3840 

TCCTflGAACT TCS^AGTATTA AAATAACCTG TTGCATGTAT TAOGCATftTT TCCTATGTTA 3900 

CATTTCTTTT GTCTATTTTC CTTTCAAAAT TGGTATTTTT GTIGGGCTC& ATTTTCATTA 3960 

TAATACTTTT CTTAAAGTTT CTTTCTTTCT TTTCTITTCT TTCTTTTTTT TTTTTTCCTT 4020 

TTTGftGAC3«3 GGTCTTftCTC TTGTCACCCA GGCTGGftGTG CAGTGGCaCA ATCTTGGCTC 4080 

ACTGCAACCT CTGCCTCCTG GGCTCAAOM ATCCTTCCAC CTCAGCCTCC CAACTAGCTT 4140 

GGACIATAGG TGCATGCCAC CATGCCTGGC TAATCTTTTG CAGCGATGAG ATTTTGCCAA 4200 

GTTGCECAGG CTGATCTTGA ACTCCTGGGC TCAAGCCATC CTCXKTCCTC AGCCTCCCAA 4260 

AATTCTGGGA TTACAGGCAT AAGCCAATGT GCCCATCCAA AGTTTTATTT ATTTATTTTT 4320 

TTGAGATGGA GTCTCGTAAA GTTACCTTTA AAAAftSAAGT TCTATTTTCC CTGTATTGGT 4380 

ATCTCCTTAA ATAAAATAAA ATATTCCIAT TGTAAGTGAT ATGAGAAATC TTTAACCAGC 4440 

CTTATCTAAA AATAAAAAGA GAAGCCATTG TAAGftCATTC AGTATGTGTA AATGTGTTTG 4500 

TGTTTGTAGA CAAAAGGCAA AGGTATTATG TAAAAATATT TAATAATTTA TTCTTTCTAT 4560 

TACTGAATTA AAAAATCAGA GGTCCCTGTT ATATTTTTAA TGGCTAACRA CTCAMCTta 4620 

TTAAGTTGGA AAAAAAACTT ATCAAAGAGA CATTTACATG GTTTGGCTTT TATATTCATC 4680 

ATAGTATACA TTGGCGGTAT CTAGCCCTTT CTCTGTAAAA TATCCCTATG TTTAATCTGT 4740 

ATTTCTTGCT TATTATATGT AAAGTTGAGC TTCTTTCTAG ATATTAGGCC TTTGRATAAA 4800 
ATTCTATGTG AGTCAGAAAA AAAAAAA 
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30 



I I I I I I 

MEIGWMHNRR QRQVLVPFVL LSLSGAGAEL GSYSWEETE RGSFVMILGK ' 

TRKARIISQG NKQHLQLKAQ TGDLLINEKL DREELCGPTE PCILHFQVLM ENPLEIFQAE 

LIHEPRDGRK YPELVLDKEL DEEEEPQLRL TLTALDGGSP PRSGTAQi/EI E 

EFEQPIYKVQ IPBNSPLGSL VATVSAHDLD GGANGKI3YT LFQPSEDI3K T 

VRLRKQVDFE MVTSYEVRIK RTDGGGLSGK CTLLLQWDV I\"DN?FQVTMS A 

PEIWAVFSV SDPDSGNNGK TISSIQEDLP PLLKPSVKNF YTLVTERALD RSAPAEYNIT 

LTVTDMGTPR LKTEHNITVQ ISDVNDNAPT FTQTSYTLFV RENNSPALHI G3VSATDRDS 

GTNAQVTYSL LPPQDPHLPL ASLVSINADN GHLFALRSLD YEALREFEFR V3ATDRGSPA 

LSSEALVRVL VLDflMDNSPF VLYPLQNGSA PCTELVPEAA EPGYLVTKW AVEGDSGQIJA 

WLSYQLLKAT EPGLFGVWAH NGEVRTARLL SERDAAKQEL WLVKDNGEP PRSATATLHV 
LLVDGFSQPF LPLPEAAPGQ TQANSLTVYL WALASVSSL FLFSVLLFVA VRLCRRSEAA 

S GTGTLSQSYQ YEVCLTGGSE TSEFKFLKPI lENFSP 
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TTTTTTTTTG ATAATACACA GACTTTAATT AAAATTCTAC TAAAATTAAA TGTCTAAATA 
AATTAGRATG GTACATGGTA CATCTAAATG TATGTTTATA TATTTTATTT GTGCATTTTA 
TTCCTftGGGT TGCTTTTGCT TTAGTTTGTA AAACGTTCTT ATTTTTATGR TAATGTftGTA 
TATACTAAAT AAAGAAAAAT CAGGAAATAG AAAATGAAGA AGAAAACATT AGCTATTGTC 
25 AACCAAATAA AAATTGTGCA ATCTCTAAGC ACSOTGAACTA TGTATTATTT GTACAGCATG 
TAOUiTGTTT ATGCTTC31CA GGGTGflGGTA GAGACTGCAA AACATTGAAC CTGGGACAAA 
TAAGAAAGTA AGGAAATTTT CACAACATAT TAATATTATA GAAAATCTTG AACTTAACAG 
ITAAGATACA AGTAGTGAAA AATGATAGTA TTTAAGQAGA TCTAGAAAAT TTA 



Seg ID NO: 50 UNA sequence 

Nucleic Acid Accession #= AF034799.1 



GATTCOSGGA GGCAAGTGAG GAGAGAAGAT GCTGTAGCGT CCTCACCGGC TGCCAGCAGG 
3TGAG0C TCCCTTCTCC TCAAGCCGGA GACTGCGCTT 
\ GCAAGGACCC GAAATCRCAG ACATTAGCAA T 



TGCACTGACA 



GGCTCGCTGG TTCTAACGGG GCTGATCCAC 



CACTAAGAAT GACGGTGGTA AAACGGCAAG C 

CTCAGGAGTA TCCAGTGAAG TTGAAGTTCT CAAGGCACTG AAATCTTTGT TTGAGCACCA 720 

CAAGGCCTTG GATCAAAAGG TAAGGGAGCG ACTGAGGGTT TCTTTAGAAA GAGTCTCTGC 780 

ACTGGAAGAA GAACTAGCTG CTGCTAATCA GGAGATTGTT GCCTTGCCTG AACAAAATGT 840 

50 TCATATACAA AGAAAAATGG CATCAAGCGA GGGATCCACA GAGTCAGAAC ATCTTGAAGG 900 

GATGGAACCT GGACAGAAAG TCCATGAGAA GOSTTTGTCC AATGGTTCTA TAGACTCAAC 9S0 

OSATGAAACT AGTCAAATAG TTGAACTACA AGAATTGCTT GAAAAGCARA ACTATGAAAT 1020 

GGCCCAGATG AAAGAACGTT TAGCAGCCCT TTCTTCCCGA GTGGGAGAGG TGGAACAGGA 1080 

AGCAGAGACft GCAAGAAAGG ATCTCATTAA AACAGAAGAA ATGAACACCA AGTATC3\AAG 1140 

55 GGACAITAGG GAQGCCATGG CACAAAAGGA AGATATGGAA GAAAGAATTA CAACCCITGA 1200 

AAAGOSTTAC CTC3«3TGCTC AGAGAGAATC TACCTCCATA CATGACATGA ATGATAAACT 1260 

AGAAAATGAG TTAGCAAATA AAGAAGCTAT CCTACGGCAG ATGGAAGAGA AAAACAGACA 1320 

GrrTACAAGAA CGTCTTGAGC lAGCIGAAGA AAAGTTGCAG CAGACCATGA GAAAGGCTGA 1380 

AACCTTGCCT GAAGTAGAGG CTGAACTGGC TCAGAGAATT GCAGCCCTAA CCAAGGCTGA 1440 

60 AGAGACACAT GGAAATAITG AAGAACGTAT GAGACAITTA GAGGGTCAAC TTGAAGAGAA 1500 

GAATCAAGAA CTTCAAAGAG CTAGGCAAAG AGAGAAAATG AATGAGGAGC ATAACAAGAG 15g0 

ATTATOGGAT ACGGTTGATA GACTTCTGAC TGAATCCAAT GAACGCCTAC AACTACACTT 1620 

AAAGGRAAGA ATGGCTGCTC TAGAAGAAAA GAATGTTTTA ATTCAAGAAT CACSAAACTTT 1680 

CAGAASGAAT CTTGAAGAAT CTTTACATGA TAAGGAAAGC TTAGCAGAAG AAATTGAAAA 1740 

65 GCTGAGATCT GAACTTGACC AATTGAAAAT GAGAACTGGC TCTTTAATTG AACCCACAAT 1800 

ACCAAGAACT CATCTAGACA CCTCAGCTGA GTTGCGGTAC TCAGIGGGAT CCCTAGTGGA 1860 

CBGOaGTCT GATTACAGAA CAACTAAAGT AATAAGAAGA CCAAGGAGBG GCCGCaiGGG 1920 

TGTGCGAAGA GATGAGCCAA AGGTGAAATC TCTTGGGGAT CAOSaGTGGA ATAGAACTCA 1980 

ACAGATTGGA GTACTAAGCA GCCACCCTTT TGAAAGTGAC ACTGRAATGT CTGATATTGA 2040 

70 TGATGRTGAC AGAGAAACAA TTTTTRGCTC AATGGATCTT CTCTCTCCAA GTGGTCATTC 2100 

CGATGCCCAG ACGCTAGCCA TGATGCTTCA GGAACAATTG GATGCCATCA ACAAAGAAAT 21S0 

CAGGCTAATT CAGGAAGAAA AAGAATCTAC AGAGTTGCGT GCTGAAGAAA TTGAAAATAG 2220 

AGTGGCTAGT GTGAGCCTCG AAGGCCTGAA TTTGGCAATG GTCCACCCAG GTACCTCCAT 2280 
TACTGCCTCT GTTACAGCTT CATCGCTGGC CAGTTCATCT CCCCCCAGTG 
75 TCCAAAGCTC ACCCCTCGAA GCCCTGCCAG GGAAATGGAT CGGATGGGAG 

GCCAAGTGAT CTGAGGAAAC ATCGGAGAAA GATTGCAGTT GTGGAAGAAG ATGGTCGAGA 2 4 SO 

GGACAAAGCA ACAATTAAAT GTGAAACTTC TCCTCCTCCT ACCCCTAGAG CCCTCAGAAT 2520 

GACTCACACT CTCCCTTCTT CCTACCACRA TGATGCTCGA AGTAGTTTAT CTGTCTCTCT 2580 
TGAGCCAGAA AGCCTCGGGC TTGGTAGTGC CAACAGCAGC CAAGACTCTC T 
80 CCCCAAGAAG AAAGGAATCA AGTCTTCAAT AGGACGTTTG TTTGGTAAAA A 

TCGACTTGGG CAGCTCCX3AG GCTTTATGGA GACTGAAGCT GCAGCTCAGG AGTCCCTGGG 27SC 

GTTAQGCAAA CTCGGAACTC AAGCTGAGAA GGATCGAAGA CTAAAGAAAA AGCATGAACT 2820 
TCTTGAAGAA GCTCGGAGAA AGGGATTACC TTTTGCCCAG TGGGATGGGC C 
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CGCATGGCTA GAGCTTTGGT TGGGAATGCC TGCGTGGTAC GTGGCAGCCT GCCGAGCCAA 
CGTGAAGAGT GGTGCCATCA TGTCTGCTTT ATCTGACACT GAGATCCAGA GAGAAATTGG 
AATCAGCAAT CCACTGCATC GCTTAAAACT TCGATTAGCa ATCCAGGAGA TGGTTTCCCT 
AACAAGTCCT TCAGCTCCTC CAACATCTCG RACTCCTTCa GGCflACGTTT GGGTGACTCA 
TGAAGAAATG GAAAATCTTG CAGCTCCAGC AAAAACGASA GAATCTGAGG AAGGAAGCTG 
GGCCCAGTGT CCGGTTTTTC TACAGACCCT GGCTTATGGA GATATGAATC ATGAGTGGAT 
TGGAAATGAA TGGCTTCCCA GCTTGGGGTT ACCTCAGTAC AGAAGTTACT T 
CTTGGTAGAT GCAAGAATGT TAGATCACCT AACAASAAAA GATCTCCGTG T 
AATGGTGGAT AGTTTCCATC GAACAAGTTT ACAATATGGA A 




TGAAAGTGAT GACAAGAACT TCAGACGTGG ATCAACCTGG AGAAGGCaGT T 

TGRAGTACAT GGAATCAGCA TGATGCCTGG GTCCTCAGAA ACATTACCAG CTGGATTTAG 3340 

GTTAACCACA ACCTCTGGGC AGTCAAGAAA AATGftCAACa GATGTTGCTT CRTCftAGACI 3900 

GCAGAGCTTA GRCAACTCCa CTGTTCGCAC ATACTCATGT TGACCAGCC& CTCAABGGAG 39E0 

SCftGCACTGA CCTGCTATGG CGTCTTTTCA GTCTACTCTA CCTAAAGTGC ACTACCATCT 4020 
AAGAAGACGA GCAGTGAAAA CCITTGTGAA AACTGAATTC 



1 11 '21 31 41 51 

I I I I I I 

MMCEVMPTIN EDTPMSQRGS QSSGSDSDSH FEQLMVNMLD ERDRLLDTLR ETQESLSLAQ 60 

QRLQDVIYDR DSLQRQLNSA LPQDIESLTG GLAGSKGADP PSFAALTKEL MACREQLLEK 120 

BBBISBIiKAB RMNTRLLLBH LBCLVSRHBR aLRMTWKRQ AQ3PSGVSSE VEVLKRLK3L 180 

FBHHKAliDEK VRBRLRVSLE RVSAIiBEBLA AMQEIVALR EQNVHIQRKM ASSBGSTBSB 240 

HLEGMEPGQK VHEKRLSNGS IDSTDETSQI VELQELLEKQ NYEMAQMKER LAALSSRVGE 300 

VBQBABTARK DLIKTBBMNT KYQRDIRBAM AQKBDMBBRI TTLBKRYLSA QRESTSIHDM 360 

NDKLBNBLAN KEAILRQMBB KNRQLQERLE LABEKLQOTM RKABTLPEVE ABLAQRIAAL 420 

TKAEETHGNI EERMRHLEGQ LEEKNQELQR ARQREKMNEE HNKRLSDTVD RLLTESNERL 180 

QLHLKBRMAA LBBKNVLIQB SBTPRKNLBE SLHDKESLAE EIEKLRSELD QLKMRTGSLI 540 

EPTIPRTHLD TSABLRYSVG SLVDSQSDYR TTKVIRRPRR GRMGVRRDBP KVKSLGDHEW SOO 

NRTQQIGVLS SHPFESDTEM SDIDDDDRET IPSSMDLLSP SGHSDAQTLA MMLQBOLDAI 660 

NKEIRLIQEE KESTELEAEE lENRVASVSL EGLNLAMVHP GTSITASVTA SSLASSSPPS 720 

GHSTPKLTPR SPARBMDRMG VMTLP3DLRK HRRKIAWEE DGRBDKATIK CBTSPPPTPR 780 

ALRMTHTLPS SYHNDARSSL SVSLEPESLG LGSANSSQDS LHKAPKKKGI KSSIGRLFGK 840 

KEKAELGQLR GFMETEAAAQ ESLGLGKLGT QAEKDRRLKK KHEJ.LEEARR KGLPPAQWDG 900 

PTWAWLELW LGMPAWYVAA CRftNVKSGAI MSALSDTBIQ RBIGISNPLH RLKLRLAIQE 960 

MVSLTSPSAP PTSRTPSGNV WVIHBBMBNL AAPAKTKBSE BGSWAQCPVF LQTIjAYGDMN 1020 

HEWIGNEWLP SLGLPQYRSY PMECLVDARM LDHLTKKDLR VHLKMVDSFH RTSDQYGIMC 1080 

LKRLNYDRKB LERRREASQH BIKDVLVWSN DRVIRWIQAI GLRBYANNIL ESGVHGSLIA 1140 

liDBNFDYSSL ALLLQIPTQN TQARQILBRB XMNIiLALGTB RRLDBSDDKN FRRGSTWRRQ 1200 
FPPREVHGIS ^l|MPGSSETLP AGERLTTTSG QSRKMTTDVA SSRLQRLDNS TVRTYSC 
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It is xmderstood that the examples described above in no way serve to limit the true 
scope of fliis invention, but rather are presented for illustrative purposes. All publications, 
sequences of accession numbers, and patent q)plications cited in this specification are herein 
incorporated by reference as if each individual publication or patent appUcation were 
specifically and individually indicated to be incorporated by reference. 
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WHAT IS CLAIMED IS: 

1 . A method of detecting an androgen-independent prostate cancer cell in a sample from 
a patient having undergone androgen ablation therapy, the method comprising determiiiing 
flie presence or absence of a nucleic acid comprising a sequence at least 80% identical to a 
sequence as shown in Tables 1 A-4. 

5 

2 , The method of claim 1 , wherein said determining is by hybridizing with a 
polynucleotide that selectively hybridizes to a sequence at least 95% identical to a sequence 
as shown in Tables lA-4. 

10 3 . The method of claim 1 , wherein the biological sample: 

a) is a tissue sample; or 

b) comprises isolated nucleic acids. 

4. The method of claim 3 : 

15 a) wherein the nucleic acids are mRNA; or 

b) further comprising flie step of amplifying nucleic acids before the step of 
contacting the biological sample with the polynucleotide. 

5 . The method of claim 2, wherein the polynucleotide: 
20 a) comprises a sequence as shown in Tables lA-4; 

b) is labeled, including a fluorescent label; or 

c) is immobilized on a solid surface. 

6. The method according to claim 1 ,wherein said biological sample is contacted with a 
25 plurality of polynucleotides that each selectively hybridizes to a sequence at least 95% 

identical to a first sequence as shown in Tables 1 A-4. 

7. The method according to claim 6,wherein said plurality of polynucleotides are 
immobilized on a solid surface. 

30 
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8. An isolated polypeptide which is encoded by a nucleic acid molecule having 
polynucleotide sequence as shown in Tables lA-4. 

9. An antibody that specifically binds a polypeptide of claim 8 . 

35 

10. The antibody o f claim 9 : 

a) further conjugated to an effector component, including a fluorescent label a 

radioisotope or a cytotoxic chemical; or 

b) which is an antibody fragment or humanized antibody. 

40 

11. A method of detecting an androgen-independent prostate cancer cell in a patient 
having undergone androgen ablation ther^y, the method comprising contacting a samp 
from said patient with an antibody of claim 9. 

45 12. The method of claimll, wherein: 

a) the antibody is further conjugated to an effector component, e.g., a fluoresce] 

label; or 

b) said sample comprises a cell. 

50 13. A method of detecting antibodies specific to androgen-independent prostate cam 
a patient having undergone androgen ablation, the method comprising contacting a biol* 
sample from the patient with a polypeptide encoded by a nucleic acid comprising a seqt 
from Tables 1 A-4. 

55 14. A method of inhibiting proliferation of androgen-independent prostate cancer ce 
a patient having undergone androgen ablation ther^y, the method comprising administi 
to the patient a therapeutically effective amoimt of a compound that specifically elimina 
cells expressing an antigen listed in Tables 1 A-4. 

60 15. The method of claim 1 4, wherein the compound is an antibody. 



16. A drug screening assay comprising the steps of: 
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a) administering a test compound to a mammal having a prostate proliferative 
condition or a cell isolated therefrom; 

65 b) comparing the level of gene expression of a polynucleotide that selectively 

hybridizes to a sequende at least 80% identical to a sequence as shown in 
Tables 1 A-4 in a treated cell or mammal with the level of gene expression of 
the polynucleotide in a control cell or mammal, wherein a test compound that 
modulates the level of expression of the polynucleotide is a candidate for the 

70 treatment of prostate cancer. 

17. The assay of claim 1 6, wherein; 

a) the control is a mammal with prostate cancer or a cell therefrom that has not been 

treated with the test compound; or 
75 b) the control is a normal cell or mammal. 

18. A method for treating a mammal having a prostate proliferative condition or prostate 
cancer comprising administering a compound identified by the assay of claim 16. 

80 19. A pharmaceutical composition for treating a mammal having a prostate proliferative 
condition or prostate cancer, the composition comprising a compound identified by the assay 
of claim 16 and a physiologically acceptable exdpient. 

20. A method of detecting a prostate cancer associated transcript, the method comprising 
85 contacting a biological sample from the patient with a plurality of polynucleotides wherein at 

least two of said polynucleotides selectively hybridize to a difference sequence at least 80% 
identical to a sequence as shown in Tables lA-4. 

21. A method of detecting a prostate cancer, the method comprising the steps of: 
90 a) providing a biological sample from a patient; 

b) contacting the biological sample with a first polynucleotide that selectively 

hybridizes to a sequence at least 80% identical to a first sequence as shown in 
Tables 1 A-4, to determine the level of a prostate cancer-associated transcript 
in the biological sample; and with a second polynucleotide tiiat selectively 
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95 hybridizes to a second sequence at least 80% identical to a sequence not 

shown in Tables lA-4; wherein the expression of said second sequence is not 
substantially changed in prostate cancer, to determine the level of expression 
of a control transcript in the biological sample; and 
c) comparing the level of the prostate cancer-associated transcript to a level of the 
100 normal tissue associated transcript in the biological sample. 



22. A method for quantitation of a prostate cancer-associated transcript in a cell from a 
patient, the method comprising contacting a biological sample from the patient with a 
polynucleotide that selectively hybridizes to a sequence at least 80% identical to a sequence 
105 as shown in Tables 1 A-4. 



23 . The method of claim 22, wherein: 

a) the polynucleotide selectively hybridizes to a sequence at least 95% identical to a 
sequence as shown in Tables 1 A-4; 
110 b) the biological sample is a tissue sample; 

c) the biological sample comprises isolated nucleic acids; 

d) the nucleic acids are mRNA; 

e) further comprising the step of amplifying nucleic acids before the step of 

contacting the biological sample with the polynucleotide; 
115 f) the polynucleotide comprises a sequence as shown in Tables 1 A-4; 

g) the polynucleotide is labeled, including a fluorescent label; or 

h) the polynucleotide is immobilized on a solid surfece. 



24. A biochip comprising a plurality of polynucleotides that selectively hybridize to a 
120 sequence at least 80% identical to a sequence as shown in Tables 1 A-4. 



25. A method of screening drug candidates comprising: 

a) providing a cell that ejq)resses an expressiori profile gene selected from the group 

consisting of an expression profile gene set forth in Tables lA-4 or fragment 
125 thereof; 

b) adding a drug candidate to said cell; and 
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c) determining the effect of said drug candidate on the expression of said expression 
profile gene. 

26, A method according to claim 22 wherein said deterrnining con^rises comparing the 
level of expression in the absence of said drug candidate to the level of expression in the 
presence of said drug candidate. 
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